McGraw-Hill - Delivering Business Intelligence with Microsoft SQL Server 2008 (2009)02

Chia sẻ: Hoang Nhan | Ngày: | Loại File: PDF | Số trang:10

0
154
lượt xem
29
download

McGraw-Hill - Delivering Business Intelligence with Microsoft SQL Server 2008 (2009)02

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

McGraw-Hill - Delivering Business Intelligence with Microsoft SQL Server 2008 (2009)02

Chủ đề:
Lưu

Nội dung Text: McGraw-Hill - Delivering Business Intelligence with Microsoft SQL Server 2008 (2009)02

  1. 22 Delivering Business Intelligence with Microsoft SQL Server 2008 Manufacturing Automation The manufacturing automation system tracks the materials used to make each product. It also stores which products are manufactured on which production lines. Finally, this system tracks the number of items manufactured during each shift. The manufacturing automation system uses a proprietary data-storage format. Data can be exported from the manufacturing automation system to a comma-delimited text file. This text file serves as the source for loading the manufacturing data into the business intelligence systems. Order Processing The order processing system manages the inventory amounts for all products. It tracks wholesale orders placed by non–Max Min retailers. The system also records product amounts sold through the Max Min retail stores and the Max Min online store to maintain inventory amounts. The order processing system tracks order fulfillment, including product shipping. It also generates invoices and handles the payment of those invoices. In addition, this system records any products returned from the retailer. The order processing system uses a Microsoft SQL Server database as its backend. Point of Sale The point of sale (POS) system manages the cash registers at each of the five Max Min–owned retail stores. This system also tracks the inventory at each retail store using Universal Product Code (UPC) barcode stickers placed on each item. The POS system handles both cash and credit card transactions. It also tracks information on any products returned by the customer. Information from each of the five POS systems is exported to an XML file. This XML file is transferred nightly, using File Transfer Protocol (FTP), to a central location. These XML files serve as the source for loading the POS data into the business intelligence systems. MaxMin.com The MaxMin.com online store is an ASP.NET application. It uses SQL Server as its backend database. All sales through the online store are paid with a credit card. All customers of the online store must provide name, address, phone number, and e-mail address with each purchase. The online store tracks the shipping of orders. It also handles any products returned by customers. Finally, the online store saves information on product promotions and discounts that are run on the store site.
  2. C h a p t e r 2 : M a k i n g t h e M o s t o f W h a t Yo u ’ v e G o t 23 Accounting The accounting system tracks all the financial transactions for Max Min, Inc. This includes the purchase of raw materials for manufacturing. The accounting system uses a SQL Server database for its backend. Building the Foundation In Chapter 3, you will learn more about the foundations of our business intelligence systems. We explore possible sources for our business intelligence data. We also look at what the structure of those data sources might look like.
  3. This page intentionally left blank
  4. Chapter 3 Seeking the Source— The Source of Business Intelligence In This Chapter c Seeking the Source c The Data Mart c Snowflakes, Stars, and Analysis Services
  5. 26 Delivering Business Intelligence with Microsoft SQL Server 2008 Planning ahead is a good idea. It wasn’t raining when Noah built the ark. —Anonymous I n the previous chapter, we discussed the various ways business intelligence can aid in making effective business decisions. We also looked at the characteristics of the business intelligence used at different levels within our organizations. Finally, we were introduced to Maximum Miniatures, Incorporated, the source for all sample data in this book. In this chapter, we begin planning the database structures to serve as the source of our business intelligence. In some cases, we can extract our business intelligence information directly from the same database used to store the data from our daily business operations. In many cases, however, we need to move that data into another location before we can use it as business intelligence. This “other location” is known as a data mart. Seeking the Source We have seen that business intelligence is important for effective decision making in our organizations. This, however, leads to a big question. Just where is this business intelligence going to come from? Is business intelligence a form of corporate espionage? Do we need to send up spy satellites to watch our competitors and tap the phone lines of our clients? Should we be hiring secret agents to infiltrate our rivals’ facilities? Of course not! Does business intelligence require us to take the pulse of the people? Do we need to commission large studies of our potential customers? Do we need to conduct a survey to determine what people are thinking about our products or services? While some business intelligence may come from customer satisfaction surveys or market research, the customer’s buying behavior is a better gauge of their tendencies and satisfaction. At any rate, this is not what we are going to focus on in this book. The bulk of business intelligence for most organizations comes from something they already have: their transactional data. Transactional Data Most organizations need to keep track of the things they do to conduct their business. Orders taken, products produced, services rendered, payments received from clients, and payments made to vendors are all interactions that usually result in one or more entries in some type of data store. Each of these interactions is a business transaction, so we refer to this as transactional data.
  6. Chapter 3: Seeking the Source—The Source of Business Intelligence 27 Definition Transactional data is the information stored to track the interactions, or business transactions, carried out by an organization. The business transactions of an organization need to be tracked for that organization to operate. Payments must be collected for products and services. Payments must be made for goods and services received. Orders and service requests need to be fulfilled. In general, the organization needs to keep track of what it has done and what it needs to do. When these transactions are stored on and managed by computers, we refer to this as online transaction processing, or OLTP. Definition Online transaction processing (OLTP) systems record business interactions as they happen. They support the day-to-day operation of an organization. The sum of these transactions stored in OLTP systems is the history of an organization. This transactional data contains the raw numbers necessary to calculate the measures we discussed in the previous chapter. Here, then, is the data we need to create our business intelligence. Difficulties Using Transactional Data for Business Intelligence OLTP systems are the treasure chests holding the raw data we need to calculate measures and create business intelligence. Problems arise, however, when we try to extract these nuggets of raw data from our OLTP systems. Let’s take a look at some of the difficulties. The Nature of the Beast Well-designed OLTP systems are optimized for efficiently processing and storing transactions. This means breaking data up into small chunks using the rules of database normalization. This allows OLTP systems to process a number of transactions at the same time without one transaction getting in another’s way. Information of this type is best stored in a relational database. The measures we are using for business intelligence, on the other hand, are not designed to reflect the events of one transaction, but to reflect the net result of a number of transactions over a selected period of time. Business intelligence measures are often aggregates of hundreds, thousands, or even millions of individual transactions. Designing a system to provide these aggregates efficiently requires an entirely different set of optimizations.
  7. 28 Delivering Business Intelligence with Microsoft SQL Server 2008 Definition An aggregate is a number that is calculated from amounts in many detail records. An aggregate is often the sum of many numbers, although it can also be derived using other arithmetic operations or even from a count of the number of items in a group. For example, the total amount invoiced to a client in a given year is the aggregate sum of all the invoice amounts for that client in the given year. OLTP systems, because of the way they are designed, are usually not good at delivering large aggregates. This is not what they were intended to do. We need to look to a different type of data storage optimization to make these aggregates work efficiently. Interfering with Business Operations OLTP systems are used by our organizations to support their daily operations. In many cases, the organizations’ operation depends on the performance of these systems. If the order processing system or the client management system becomes too bogged down, our organizations can grind to a halt. We’ve already discussed the fact that OLTP systems are not good at delivering the aggregates needed for business intelligence. When OLTP systems are called on to produce such aggregates, they typically use a large amount of processing power and take a long time to produce a result. It is also possible that a large number of records will be locked while the aggregate is being produced, rendering those records unavailable to participate in transactional processing. Either of these two events can have a serious impact on transactional processing efficiency. In other words, requiring an OLTP system to create business intelligence aggregates can tax the system. This can have a detrimental effect on our organizations’ daily operations. Archiving Because OLTP systems are concerned with the day-to-day operations, they aren’t too worried about data from the distant past. These systems may only save data for a relatively short period of time (and/or the data may only represent the current state, such as current quantity in stock). The data may be saved for a year, and then a year-end process may remove it from the database. It may be archived in another format, a text file or a database backup file, or it might simply be deleted. Whether deleted or archived, the data is no longer easily accessible. OLTP systems use this archive process to ensure that the system continues to operate efficiently. If a transaction table contains too many records, the OLTP system can become bogged down and begin to operate slowly. Archiving allows an OLTP system to stay lean and mean. This archiving causes problems for business intelligence. When we are looking for trends in our measures, we want to compare last year’s numbers to this year’s numbers. We may even want to compare numbers over several years of operation. This is hard to do when the data from past years has been archived or deleted.
  8. Chapter 3: Seeking the Source—The Source of Business Intelligence 29 Divided They Stand Our organizations probably use a number of different OLTP systems to manage different aspects of their operations. One system is used for order processing, a different system for accounting, another for manufacturing, and still another for personnel. As we saw in the previous chapter, Maximum Miniatures, Incorporated has five different systems that can provide data for business intelligence. Even with the move toward integrated Enterprise Resource Planning (ERP) systems, it is unlikely that all of an organization’s transactional data will be in one location. The measures used to provide business intelligence, on the other hand, do not respect these lines of separation. Instead, they treat the organization as a whole. For example, a reasonable measure to require is the profit margin for a particular product. To calculate this measure, we need the list of raw materials from the manufacturing system, the cost of those materials from the accounting system, the cost of labor required to produce the product from the time entry system, and the amount paid for the product from the order processing system. To calculate this type of a measure, then, we need to combine data across systems to get what we need. Aside from the necessity for communication between systems, this need to cross systems leads to another problem. Each of these systems maintains its own set of product numbering schemes, codes, and calendars. The same product may be known as “12593” in the manufacturing system and “SD125RDS” in the order processing system. The payroll system may work on two-week pay periods, while the accounting system works on fiscal months. When data from these disparate systems is brought together, we need to find some common ground. The Data Mart A number of problems can result when we try to use our organizations’ OLTP systems as the source for our business intelligence. What we need to do is take the information stored in these OLTP systems and move it into a different data store. This intermediate data store can then serve as the source for our measure calculations. We need to store the data so it is available for our business intelligence needs somewhere outside of our OLTP systems. When data is stored in this manner, it is referred to as a data mart. Definition A data mart is a body of historical data in an electronic repository that does not participate in the daily operations of the organization. Instead, this data is used to create business intelligence. The data in the data mart usually applies to a specific area of the organization.
  9. 30 Delivering Business Intelligence with Microsoft SQL Server 2008 note In this book, we discuss the creation of data marts, rather than the perhaps more familiar term, data warehouse. Data warehouses tend to be large, one-stop-shopping repositories where all the historical data for the organization would be stored. Nothing is wrong with this as a concept; however, attempting to create a data warehouse often led to huge, multiyear technology projects that were never quite finished or were outdated when they finally did get done. In this book, we concern ourselves with creating data marts—smaller undertakings that focus on a particular aspect of an organization. Features of a Data Mart Because the data mart is meant to serve as a source for business intelligence rather than managing the organization’s day-to-day transactions, it is not designed the same as an OLTP database. Instead of being built around the rules of normalization, data marts are built for speed of access. A data mart is still a relational database, but it is designed to require fewer table joins when data is output for analysis and reporting. In a data mart, it is acceptable to have data repeated (denormalized) for the sake of speed. When designing a data mart, the rules of normalization are replaced by a different method of design organized around “facts.” These new design approaches are called stars and snowflakes. We discuss stars and snowflakes in the sections “The Star Schema” and “The Snowflake Schema.” Stars and snowflakes may seem like the stuff of children’s fantasies, but, in reality, they provide quite grownup and down-to-earth approaches to creating information that is quick and easy to access. Not Real-Time Data OLTP systems store data from business transactions as they occur throughout the day. Data marts, on the other hand, are updated at set intervals. Data is copied from the OLTP systems periodically and written to the data mart. This is known as a data load. Because the data mart exists separately from the OLTP systems, accessing the data mart for business intelligence information does not put any stress on the transactional systems vital to the business’ operation. The only exception to this is during the data load. During the data load, the OLTP systems may have to work hard to prepare the data for copying to the data mart. The good news here is the data load is an automated process that can be scheduled to run during off-peak hours. As we discussed in Chapter 2, information in a data mart has some latency. In most cases, some time elapses between the moment a transaction is completed and the moment when the transaction is copied to the data mart. If a data mart load is
  10. Chapter 3: Seeking the Source—The Source of Business Intelligence 31 scheduled to run each month right after the month-end processing, then the data mart has a latency of one month. If the data load runs nightly, the data mart can have a latency of up to one day. The latency of the data mart must be set properly to fulfill the business intelligence requirements of that data mart. The information provided by the data mart must be up- to-date enough to facilitate effective decision making. However, the data loads should not occur so often that they cause unneeded stress on the OLTP systems. Consolidation and Cleansing Data from a number of different OLTP systems may be combined into a single data mart. This enables us to calculate some complex measures for our business intelligence. As we discussed earlier, this may also cause problems. Multiple OLTP systems can have different ways of representing data. Inconsistent data types used for the same data, dissimilar unique identifiers used for the same entity, and different time periods and calendar systems can all cause a great deal of difficulty when trying to combine data from heterogeneous systems. In fact, problems can even arise when using data from a single system. The business rules necessary for a meaningful measure calculation may be stricter than those enforced within the OLTP system itself. If this is the case, some of the data coming from the OLTP system may not meet the stricter rules. Inconsistencies with data types and unique identifiers could also exist within the same system if the database has been poorly designed or poorly maintained. These problems must be resolved before the data can be stored in the data mart. We must scrub out all the problem data. To do this, the data is put through a data cleansing process. Definition Data cleansing removes inconsistencies and errors from transactional data so it has the consistency necessary for use in a data mart. Data cleansing transforms data into a format that does not cause problems in the data mart environment. It converts inconsistent data types into a single type. Data cleansing translates dissimilar identifiers to a standard set of codes for the data mart. In addition, it repairs or removes any data that does not meet the business rules required by the measures calculated from this data mart. Data cleansing is usually done as part of a larger process. This process extracts the data from the OLTP systems and loads it into the data mart. Thus, the entire procedure is known as extract, transform, and load—or ETL.
Đồng bộ tài khoản