Data Warehousing and Online Analytical Processing
Microsoft® SQL Server™ 2000 provides components that can be used to build data warehouses or data marts. The data warehouses or data marts can be used for sophisticated enterprise intelligence systems that process queries required to discover trends and analyze critical factors. These systems are called online analytical processing (OLAP) systems. The data in data warehouses and data marts is organized differently than in traditional transaction processing databases.
Enterprise-level relational database management software, such as SQL Server 2000, was designed originally to centrally store the data generated by the daily transactions of large companies or government organizations. Over the decades, these databases have grown to be highly efficient systems for recording the data required to perform the daily operations of the enterprise. Because the system is based on computers and records the business transactions of the enterprise, these systems are known as online transaction processing (OLTP) systems.
The data in OLTP systems is organized primarily to support transactions, such as:
- Recording an order from a point-of-sale terminal or entered through a Web site.
- Placing an order for more supplies when inventory levels drop to a defined level.
- Tracking components as they are assembled into a final product in a manufacturing facility.
- Recording employee data.
- Recording holders of licenses, such as restaurant or driver licenses.
Individual transactions are completed quickly and access relatively small amounts of data. OLTP systems are designed and tuned to process hundreds or thousands of transactions being entered at the same time.
Although OLTP systems excel at recording the data required to support daily operations, OLTP data is not organized in a manner that easily provides the information required by managers to plan the work of their organizations. Managers need summary information from which they can analyze trends that affect their organization or team. They need to find the critical factors affecting the success of their organization, and how best to adjust those factors to improve the success of the enterprise. They need to find how the workload of their enterprise is affected by seasonal and yearly trends so that they can predict how many employees and resources will be required to perform future work.
Systems designed to handle the queries required to discover trends and critical factors are called online analytical processing (OLAP) systems. OLAP queries typically require large amounts of data. For example, the head of a government motor vehicle licensing department could ask for a report that shows the number of each make and model of vehicle registered by the department each year for the past 20 years. Running this type of query against the original detail data in an OLTP system has two effects:
- The query takes a long time to aggregate (sum) all of the detail records for the last 20 years, so the report is not ready in a timely manner.
- The query generates a very heavy workload that at least slows down the normal users of the system from recording transactions at their normal pace.
Another issue is that many large enterprises do not have only one OLTP system that records all the transaction data. Most large enterprises have multiple OLTP systems, many of which were developed at different times and use different software and hardware. In many cases, the codes and names used to identify items in one system are different from the codes and names used in another. Managers running OLAP queries generally need to be able to reference the data from several of these OLTP systems.
OLAP data is organized into multidimensional cubes. The structure of data in multidimensional cubes gives better performance for OLAP queries than data organized in relational tables. The basic unit of a multidimensional cube is called a measure. Measures are the units of data that are being analyzed. For example, a corporation that operates hardware stores wants to analyze revenue and discounts for the different products it sells. The measures are the number of units sold, revenue, and the sum of any discounts. The measures are organized along dimensions. In this example, a three dimensional cube could have these dimensions: time, store, and products. Think of these dimensions as forming the logical x, y, and z axis of a three-dimensional, virtual cube.
Each dimension is divided into units called members. The members of a dimension are typically organized into a hierarchy. Similar members are grouped together as a level of the hierarchy. For example, the top hierarchy level of a time dimension can be years, with months at the next level, then weeks, days, and finally hours at the bottom level of the hierarchy. At each intersection of the three dimensions, the values for the measures that match those three dimension values are recorded. For example, suppose that the hour starting at 1:00 P.M. Saturday, Feb. 19, 2000 is a time dimension member, Store #2 of Albany, New York is a store dimension member, and Easy-Clean Mops are a product dimension member. Where these three dimensions meet, the cell records that 10 mops were sold for revenues of $90.00 and an average discount of $1.00.
The specific dimensions and measures defined for the cubes in any particular OLAP system depend on the kinds of analysis important to the enterprise. Transforming OLTP data from relational tables into OLAP cubes, and the design of the cubes, is a complex area that is the subject of many third-party books.
OLAP systems operate on OLAP data in data warehouses or data marts. A data warehouse stores enterprise-level OLAP data, while a data mart is smaller and typically covers a single function in an organization.