Analysis Services: Choosing Dimension Types in SQL Server 2000 Analysis Services
Microsoft SQL Server Analysis Services
Summary: This paper describes how and when to use the variety of dimension types available in Microsoft SQL Server 2000 Analysis Services. (24 printed pages)
Using Microsoft® SQL Server™ 2000 Analysis Services, you can easily create OLAP cubes and dimensions from a variety of data sources. In SQL Server 7.0 OLAP Services, you only need to make a few choices to build a dimension, but there are some limitations to the types of dimension characteristics that you can easily model. In contrast, SQL Server 2000 Analysis Services allows you to model a much richer set of dimension types. This paper describes how and when to use the variety of dimension types available in SQL Server 2000 Analysis Services.
The choice of dimension type that you use when creating a cube is based on several factors. These factors include the type of business problem that you want to model, the perspectives from which you want to view data, the information you want to expose, and the characteristics of the existing data set (size, structure, and member relationships). This paper will help you choose the most appropriate dimension type to model a dimension based on these criteria. It will explain why to implement each type of dimension, provide business examples, discuss implementation details and describe how dimension choice will affect processing, query performance, and cube maintenance.
This paper assumes that you have a basic knowledge of data warehousing concepts, and an understanding of multidimensional terms including cube, dimension, measure, and aggregation. Some experience building simple cubes with SQL Server 7.0 OLAP Services is also helpful.
After studying this paper, you will be able to:
- Choose an appropriate dimension type based on the type of business problem, the perspectives to view data, and the characteristics of the data.
- Create shared or private dimensions.
- Create regular dimensions from a star or snowflake schema.
- Represent ragged hierarchies.
- Define multiple hierarchies within a dimension.
- Use parent-child dimensions to represent unbalanced hierarchies.
- Specify custom rollup in a dimension.
- Create member properties.
- Use a virtual dimension to analyze data based on dimension attributes.
This paper begins by discussing the impact of the structure of source data on building dimensions and cubes. It presents some design choices that apply to all dimension types, such as sharing dimensions across cubes and using multiple hierarchies. Examples of regular dimensions with balanced or ragged hierarchies, virtual dimensions, and parent-child dimensions with unbalanced hierarchies are discussed. For each case, the paper describes the choices you need to make when modeling these dimension types using Analysis Services.
The choice of dimension type depends in part on how the data is stored and organized in the underlying relational database. When you use a star or snowflake schema to contain the data, the relational schema also reflects elements of the multidimensional design.
Dimension Tables Contain Cube Dimension Characteristics
The perspectives from which you can analyze data are usually contained in dimension tables in the relational database. Dimension tables contain the characteristics of a dimension. These tables have columns that describe how to aggregate or roll up the data as well as columns that provide additional information about members of a dimension. For example, a Customer dimension table contains the following columns:
Figure 1. Customer dimension table
In the customer table, country, state_province, city, and name define a hierarchy by which we want to view and calculate summary information in the cube. The hierarchy has the following structure:
The remaining columns provide information about the members at the lowest level of the hierarchy, in this case, the individual customer. To enable you to analyze data or filter the view of data in the cube by using these characteristics, you can create member properties in Analysis Services dimensions from these columns. Additionally, one column joins the dimension to the central fact table using a surrogate key.
In a star schema, a single table in the relational database stores all the information for a particular dimension. Although a star schema typically contains only a few dimension tables (three to eight), it can have as many dimension tables as is needed to describe a business process. You will most often create Analysis Services dimensions from a single dimension table in the relational database.
Dimension Information from Multiple Tables
You can use multiple tables and data sources to create an Analysis Services dimension. Many data warehouses use this type of multidimensional schema. In a snowflake schema, the information for a dimension is spread among multiple tables. For example, a snowflake schema might contain product information in the product and product class tables. The product_class_id column joins these two tables. The product table is also joined to a central fact table using the product_id column.
Figure 2. Product table
A well-planned relational data warehouse schema should be established before you start building cubes. If the underlying schema uses a snowflake design, and you want to mirror the dimensional design in the Analysis Services dimension, then you should choose a snowflake schema when creating the Analysis Services dimension.
Choosing a Star or Snowflake Schema
When you design the relational schema for a data warehouse, you generally use a star or snowflake schema. The choice should be based on the requirements of how the relational data is used, if other applications will access the tables, and by maintenance requirements. After dimension data is loaded into Analysis Services, it is treated the same way regardless of the underlying relational schema.
Loading dimension information requires Analysis Services to issue an SQL query to the relational source. A snowflake schema uses one or more inner joins to the primary dimension table in order to load the dimension information. Using a snowflake schema will be slightly more expensive in terms of performance compared to using a star schema because of the additional joins. Even though a snowflake schema can save some space in the relational database, keep in mind that dimension tables are typically small compared to the size of the fact table. The dimension rows are likely to comprise less than 10% of the total size of the data warehouse, with the other 90% being fact table rows.
Additionally, most RDBMS systems do a good job of optimizing join performance for a small number of tables. The relational DBA can also help optimize performance by creating appropriate indexes on the join columns in the schema. The additional overhead of processing an additional join when loading a dimension will probably be small. Analysis Services handles using either type of schema equally well.
Dimension Information from Views on Relational Data
In most data warehouse projects, the star or snowflake schema design represents the end result of extensive analysis of a business processes. The data in a data warehouse system is generally not production data. The data may need to be gathered from many sources and cleansed to make it consistent. For all of these reasons, you will probably want to separate your data warehouse from your production database.
In some situations, you might need to build a dimension from tables that are not in your source database. While it is certainly possible to build dimensions and cubes based on an entity-relationship data model, it is recommended that you either create or simulate a star or snowflake schema.
If you don't have a star or snowflake schema in place, you can use views to provide a base for building dimensions and cubes. Creating these views helps to clarify your multidimensional design and makes it easier to build dimensions and cubes. For example, the Human Resources department maintains a master file of employee information in a fully normalized relational database. The data requires some scrubbing to make it consistent. You implement a view to filter the information, change the representation of some of the columns, and clarify the design of the dimension.
If you are using Microsoft SQL Server 2000 to manage the relational database, you can use indexed views to implement the star/snowflake schema. Using indexed views allows any changes in the source table to be reflected automatically in the view. Since SQL Server indexed views use a clustered index, query performance is similar to using a table.
You can also use a view to help you logically partition a dimension and a fact table. For example, one master Customer dimension table with a related fact table contains sales to all customers. A subset of customer sales represents sales to businesses rather than individuals. Suppose that you want to analyze the business-to-business sales separately from all the individual customer sales. You can create a view of the dimension table to filter customers and a view of the fact table to show only sales to those customers. Then, build a dimension and cube based on these views.
Using Dimension Filters
In addition to using views, you can logically partition a dimension table or fact table by filtering the rows used to build the dimension or cube. The Source Table Filter property applies to both dimension and cubes. It defines the subset of rows to read from the dimension or fact table when building a dimension or cube.
Use a dimension filter to specify which members of the source table are used to build a dimension. You must use the Dimension Editor to set the filter; you cannot use the Dimension Wizard. For example, you want to create a dimension containing customers in the Netherlands. Set the Source Table Filter property for the dimension in Dimension Editor to:
Example 1 "Customer.Country" = 'Netherlands'
In addition, you can define multiple criteria to filter rows when creating a dimension. For example:
Example 2 "Customer.Country" = 'Netherlands' or "Customer.Country" = 'Belgium Example 3 "Product.ProductFamily" = 'Drink' and "Store Type.Store Type" = 'Supermarket'
Dimension filters can be defined in all dimension types except virtual dimensions.
Part of designing an Analysis Services dimension is determining how it will be used to build cubes and perform analysis. Your design will influence the choice of making a dimension shared or private, as well as representing multiple hierarchies in a single dimension or in multiple dimensions.
Sharing Dimensions Across Cubes
You commonly build different cubes to represent different aspects of your business. These cubes can be based on different fact tables. When you create a new dimension in Analysis Services, you must define it as private or shared. Choosing to make a dimension private or shared depends on the structure of the fact tables in the warehouse and the business process you are trying to model.
Using Private Dimensions
Private dimensions can be used in only one cube. Use a private dimension when the characteristics by which you want to view and aggregate data exist in only one cube and relate only to a particular fact table.
You can also use private dimensions to simplify cube maintenance. Private dimensions are always processed along with the cube on which they are defined. They do not have to be processed separately. If there are changes in a shared dimension, all the cubes that use that shared dimension must be reprocessed. Creating private dimensions allows you to minimize the processing of multiple cubes. If the cubes are very large, this strategy can help reduce the overall processing time by processing only one cube rather than several. For example, a master customer dimension contains information about general customers of a business. You create a private customer dimension that contains only certain customers. You create a separate cube to analyze the sales to these customers only. This strategy allows you to process the cube containing the sales to special customers without having to process all the cubes that share the master customer dimension.
Sometimes, you want several cubes to use the same dimension structure but with some different dimension characteristics. You might define a dimension that contains custom member formulas. These formulas may only be valid for one cube. For example, a Budgets cube uses an accounts dimension that contains a custom member formula. This formula looks up information from the Sales cube. Even though you want to analyze the information in the Sales cube by members of the accounts dimension hierarchy, you cannot include the accounts dimension in the Sales cube because it results in a circular reference. In this case, you must build two separate dimensions that have the same hierarchical structure but different custom member formulas.
After a dimension is defined as private, it cannot be converted to being shared. Because of this, consider defining a dimension as shared if there is any possibility that it may be used in more than one cube.
Using Shared Dimensions
You can use shared dimensions when defining multiple cubes. Create shared dimensions when you want to ensure that dimension characteristics mean the same thing in each cube where the dimension is used. For example, you create a Product dimension that contains all of the products manufactured by a company. It represents the master list of products and defines a rollup hierarchy as well as many descriptive product properties. The information could be gathered from several sources and scrubbed to eliminate duplicates and inconsistencies. By using this dimension in different cubes that model related business processes (manufacturing, inventory, or sales), you enable several cubes to provide a consistent picture of your business across multiple processes. If you used a different product dimension for each cube, there could be inconsistencies in the interpretation of product descriptions and properties. In addition, shared dimensions make it possible to combine information from two related cubes in a virtual cube. In this case, the shared dimension acts similar to the join column when combining tables in a relational view.
Time dimensions are almost always shared dimensions. We frequently want to analyze a set of business processes by time. A time dimension usually contains a listing of dates for a given time period. Each date can have several attributes, such as day_of_week, day_of_month, is_holiday, or other attributes that relate to legal periods or marketing seasons. In addition to ensuring consistency, a shared time dimension saves you time because you do not have to re-create the dimension for each cube.
You can create a shared dimension from a single dimension table, multiple dimension tables, an OLAP data-mining model, or from the member properties of another dimension. The dimension table or the mining model you select should contain the column or columns you want to include in the shared dimension.
You should make a dimension shared unless there is some explicit reason to make it private. Using shared dimensions can also result in more efficient use of server memory. This is because the members are loaded only once and are not duplicated by those in private dimensions.
Representing Multiple Hierarchies in a Dimension
You can create multiple hierarchies for a dimension to provide alternative views of dimension members. For example, a time dimension may have a calendar year view and a fiscal year view. These dimensions can have different level structures.
A retail calendar may divide the year into thirteen "months" of four weeks each. This hierarchy rolls up into retail seasons that do not map the standard calendar quarters. Provided that the dimension tables contain data that fully describes how the hierarchy is organized, you can create an additional hierarchy to model the retail calendar.
Another example is a customer dimension that contains two hierarchies, one based on states, the other based on sales regions. In this example, the states do not roll up into sales regions.
In Analysis Services, a dimension with multiple hierarchies is implemented as a collection of dimensions that share the same relational source data. If you define a dimension with multiple hierarchies rather than build separate dimensions, Analysis Services can choose a set of aggregations to build so that they are useful to both hierarchies.
Even though you could model an alternate hierarchy by creating a new dimension or a virtual dimension based on the original data columns, you should build multiple hierarchies of a dimension rather than use separate dimensions. The most important reason for this is the savings in determining a set of useful aggregations. In addition, many client tools understand the concept of multiple hierarchies and can expose this to users to enhance analysis.
To inform Analysis Services that a dimension is an alternate hierarchy for another dimension, you name the new dimension using the following format:
You can use the Dimension Wizard or the Dimension Editor to build multiple hierarchies of a dimension. For each hierarchy, the process is similar to creating a new dimension. You can also use an existing dimension as a template for a new dimension hierarchy. Edit the dimension and then save it with a new name in the form Dimension.Hierarchy.
Managing Changing Dimensions
When you create a new dimension, you can specify it as a changing dimension. Changing dimensions are optimized for frequent changes to the source data and the dimension structure. Changing dimensions do not need to be fully processed when certain types of changes are made. As a result, the cubes based on these dimensions may not need reprocessing when the underlying dimensions are modified. Since full dimension processing will interrupt users access to a cube, changing dimensions increase cube availability by reducing the frequency of processing the cube.
You should make a dimension a changing dimension when:
- Users need to see dimension updates very soon after they are made.
- The time window for updating the cube is limited.
- Dimension data changes frequently and unpredictably.
Although changing dimensions improve cube accessibility, queries that use changing dimensions tend to take longer to process. You must take this trade-off into account when deciding to make a dimension a changing dimension.
In a changing dimension, you can add, move, rename, or delete members that are below the top level and above the bottom level in the hierarchy. If the dimension has an (All) level, this is considered the top level. If the dimension is a parent-child dimension, all leaf members are considered to be at the bottom level in the hierarchy. Any cubes that use the dimension will be updated when you save the dimension.
In non-changing dimensions, the full dimension path of each member is stored in the cube. For example, a cube that uses a Customer dimension might store the member as [Customers].[USA].[New York].[Mike Nash]. If this member is moved, renamed or deleted, the cube must be reprocessed because the reference to the member in the dimension is lost. Changing dimensions overcome this problem by storing a key rather than a full member path. When a member is retrieved, the member key is mapped to a name in a mapping table.
Certain types of dimensions are automatically built as changing dimensions. These include, virtual, parent-child, and ROLAP dimensions. To specify other dimension types as changing, you can use Dimension Editor to set the following properties:
- Set the Changing property to TRUE.
- Set the Member Keys Unique property of the lowest level of the hierarchy to TRUE.
- For private dimensions, set the Aggregation Usage property to STANDARD.
- For shared dimensions, set the Aggregation Usage property to a value other than CUSTOM. Custom aggregation is not valid for parent-child dimensions and changing dimensions.
Regular dimensions can be based on information in a star or snowflake schema and can be shared or private. Use Dimension Editor to create shared regular dimensions or Cube Editor to create private regular dimensions.
The characteristics of the hierarchies in the dimension determine if you should create a regular dimension. A regular dimension contains the same number of levels as the number of columns selected during its definition. These levels are usually organized from most general to least general. For example, if a Customer dimension is based on the City, State, and Customer Name columns from a table, the hierarchy will have three levels plus an optional (All) level.
By default, the dimension contains an (All) level used for the top-level aggregate. When you build a dimension in Dimension Editor, you can choose to include or exclude the (All) level by clicking the Advanced tab, and then clicking All Level.
In addition to these characteristics, the members in each level of the hierarchy are distinct entities and could not be moved in the hierarchy. For example, the members that make up the Customer Name level are fundamentally different from those on the State level. It would not make any sense for members of the Customer Names level to appear as members in any other level in the hierarchy.
You can use a regular dimension to model either a balanced or ragged hierarchy.
Modeling Balanced Hierarchies
In a balanced hierarchy, all branches of the hierarchy descend to the same level, and each member's logical parent is the level immediately above the member. Balanced hierarchies have a symmetrical number of levels for each of its branches. Many common dimensions exhibit balanced hierarchies.
A time dimension usually has a balanced hierarchy. For example, a time dimension has the following hierarchy:
In this time dimension tree, all branches have common levels. In addition, each level has at least one member. All branches of the tree descend through all the existing levels. There are no empty positions in the hierarchy.
Figure 3. Time dimension tree
Other examples of balanced hierarchies may include a product or merchandise dimension. In this dimension, all product SKUs can be characterized by package size, brand, category, and department. For example:
Common to both examples is the idea that there are no skipped or empty levels in the hierarchy. All the lowest level members roll up through all the levels in the hierarchy. All the branches of the hierarchy terminate at the leaf level and the lowest level members are the same distance away from the top level.
Modeling Ragged Hierarchies
You can use a regular dimension to represent a ragged hierarchy. In a ragged hierarchy, branches of the hierarchy can descend to different depths, and each member's logical parent can be located more than one level above the member. Another way to think about it is to say that a ragged hierarchy has empty positions in the member hierarchy.
A geography-based dimension may contain ragged hierarchies. For example, a Customer dimension has the following hierarchy:
This dimension models our customers in several countries. Some countries have no states. In this example, Israel has no states. The parent of the Tel Aviv and Haifa members is not located immediately above them in the hierarchy. In this branch, the distance between the leaf members and their logical parent is two levels rather than one.
Figure 4. Ragged hierarchy tree
Other examples of ragged hierarchies are product dimensions for a store that has diverse product lines. For example, a large home store sells both groceries and hardware, as shown in Figure 5.
Figure 5. Example of a ragged hierarchy
Given this hierarchy, a product SKU may not have an associated sub-brand or a promotional packaging, shown as follows:
Creating Ragged Hierarchies in Regular Dimensions
You can use Dimension Editor or Cube Editor to create a ragged hierarchy and hide members from end users while they browse a regular dimension. When you create a dimension, set the Hide Member If property for a level to tell Analysis Services how to create a ragged hierarchy. Set this property in Dimension Editor if the dimension is shared or in Cube Editor if the dimension is private. To find this property in Dimension Editor, in the Properties pane, click the Advanced tab.
The Hide Member If property has five possible values.
|Never hidden||Default, no members are hidden.|
|No name||Every member whose name is null or an empty string is hidden.|
|Parent's name||Every member with the same name as its parent is hidden.|
|Only child with no name||Every member that is an only child and whose name is null or an empty string is hidden.|
|Only child with parent's name||Every member that is an only child and has the same name as its parent is hidden.|
How you implement a ragged hierarchy in a dimension depends on how the information was represented in the relational source and whether the missing members are leaf members in the tree. In regular dimensions, each level of a hierarchy maps to a column in the relational source. You can choose to represent a missing member in the hierarchy several ways:
You can use a NULL or an empty string in the source table. For example, the following table is used to create a geography dimension with some missing members.
In this case, set the Hide Member If property to No name. This tells Analysis Services to build the dimension and to hide any member that has a NULL or empty string value.
Alternatively, you can use the same name as the parent member to represent a hidden member in the source table. For example, you could represent a ragged hierarchy in the relational source by using the same name as the parent for countries that do not have states.
In this case, set the Hide Member If property to Parent's name. This tells Analysis Services to build the dimension and to hide any member that has the same name as its parent.
The missing members can also be on the leaf level of the tree. For example, a geography hierarchy may record neighborhood districts for some cities but not for others.
In this case, set the Hide Member If property to Only child with no name or Only child with parent's name, depending on the use of NULL values or the parent's name in the relational source.
In Analysis Services, you use a parent-child dimension to model an unbalanced hierarchy.
Modeling Unbalanced Hierarchies
In an unbalanced hierarchy, branches of the hierarchy descend to different levels. Organization charts are a classic example of an unbalanced hierarchy. The CEO is the top member in the hierarchy, and the division managers and executive assistant are immediately beneath the CEO. The division managers have subordinate members but the executive assistant does not. There is only one node on the tree for each employee.
Figure 6. Unbalanced hierarchy tree
Using Parent-Child Dimensions
The characteristics of the data in the relational source also indicate when you want to use a parent-child dimension. In a regular dimension, the number of levels in a hierarchy depended on the number of columns chosen from the relational source and the number of columns that each column mapped to a level in the dimension. Also, each column used for a regular dimension represented a single entity type of members. For example, all members at the City level represented city names. These characteristics are not true of parent-child dimensions.
In a parent-child dimension, two columns from the relational source together define the lineage relationships among the members of the dimension. One column identifies each member. In Analysis Services, this is the Member Key Column. The other column identifies the parent of each member. These columns are used to build an unbalanced hierarchy. In Analysis Services, this is the Parent Key Column. The resulting dimension can have many levels but each member is the same type of logical entity. You can also select a third column to provide member names to display when browsing cubes. This is called the Member Name Column and defaults to the column containing the member identifiers. If you do not want to display these identifiers, select an alternative column if one is available.
For example, you define a parent-child dimension by using the Employee ID columns to identify each member, the Manager ID column to identify the parent of each member, and the Name column to provide member names. Each member in the hierarchy is the same kind of logical entity, in this case, employees of a company.
|Employee ID||Name||Manager ID|
Notice that the top-level member, Andrew Fuller, has a NULL value for Manager ID. You can choose to represent the top-level member of a parent-child dimension by using a NULL or by using the members' own identifier in the relational source.
By default, a top-level member in a parent-child dimension has a parent identifier that equals its own identifier, null, 0 (zero), or a value not contained in the column for member identifiers. Using Dimension Editor, you can set the Root Member If property for a parent-child dimension to control how the top-level member is identified. This property can have four possible values:
|Parent is blank, self, or missing||A member is in the highest level if any one of the following three criteria is met. (The default value.)|
|Parent is blank||A member is in the highest level if its value in the Parent Key Column is null or 0 (zero).|
|Parent is self||A member is in the highest level if its value in the Parent Key Column is equal to its value in the Member Key Column.|
|Parent is missing||A member is in the highest level if its value in the Parent Key Column does not exist in the Member Key Column.|
Data for Nonleaf Members
With regular dimensions, there is a direct relationship between rows in the fact table and individual dimension members. All the fields in a dimension table row logically describe one dimension member at the lowest level of the hierarchy. For regular dimensions, there is no data in the fact table that relates to members at any other than the lowest level. For parent-child dimensions, there can be data for members at any location in the hierarchy.
Data for Members of a Parent-Child Dimension
In a parent-child dimension, all the members are of the same entity type and can exist at any level in the hierarchy. There can be fact table rows associated with members at any level. For example, you build a cube that analyzes sick days and vacation days for employees. The employee dimension is modeled using a parent-child dimension. Each fact table row records a day taken off by an employee.
Many parent-child dimensions will have data for nonleaf members in the fact table. If this case, you must set the Members With Data property for the dimension when building a cube. Otherwise, cube processing fails. By default, nonleaf members are not allowed to have associated fact table data.
The Members With Data property has the following values:
|Leaf members only||The default, leaf members only, can have associated fact table rows.|
|Nonleaf data hidden||Nonleaf members can have associated fact table data. This data is not represented among the descendents of the nonleaf members. Consequently, it might appear to end users that values aggregate incorrectly.|
|Nonleaf data visible||Nonleaf members can have associated fact table data. This data is represented among the descendents of the nonleaf members by the creation of a child for each nonleaf member.|
Creating Parent-Child Dimensions
These two common scenarios may help you understand when to use each option.
Example of Handling Nonleaf Data
The business scenario you want to model and the characteristics of the underlying fact table data influence how you implement a parent-child dimension.
Suppose you have an Employee dimension that has the following members:
Figure 7. Example of an employee tree branch
The fact table contains employee salary information. For example:
You want the end user to be able to see the data so that the individual amount for the manager salary is also shown on the report, in addition to the aggregation for all of their employees. For example, a client application displays the following report:
In this view of the data, manager's individual salaries are visible along with the aggregate data for their employees. The aggregate includes the manager's salary. To facilitate this view, you would set the Members With Data property to Nonleaf data visible.
When you set the Members With Data property to Nonleaf data visible, you need to specify how to display the name of the data members. The Data Member Caption Template property controls the names of data members. If you type a value that includes an asterisk (*), the name of each data member is the value with the asterisk replaced by the parent member's name. You set this property for each level in the dimension. In this example, this property was set to *.Individual.
If you set the Members With Data property to Nonleaf data hidden, the report would not display the individual salaries for Andrew Fuller or Stephen Buchanan. However, the aggregates would still include the data for the nonleaf members. This could be confusing to end users who may not understand how these salaries are being aggregated. For example, the client application displays the following report:
Example of Using Leaf Members Only Data in a Parent-Child Dimension
You can use a parent-child dimension in financial applications to present a summary of accounts. For example, your business tracks several categories of expenses and sales. These categories are arranged into a hierarchy that describes how to roll up the aggregates of each category. The totals from each category are added to, subtracted from, or ignored by its parent in the hierarchy.
Figure 8. Parent-child dimension
As with any parent-child dimension, the number of levels in the hierarchy depends on the characteristics of the data in the relational source. The following table represents the dimension table used to build the Account dimension hierarchy. The account_id column is the member key column and the account_parent column is the parent key column. The account_description column is used for the member name column. Set the Root Member If property to Parent is blank to handle the blank values in the account_parent column for the members of Assets and Liabilities.
|3200||3000||Cost of Goods Sold||Income||-|
|4100||4000||General & Administration||Expense||+|
In this type of dimension, the intermediate level members represent the aggregates of their children and do not have any associated fact table data. In this example, only leaf members will have fact table rows. To model this situation, leave the Members With Data property set to the default of Leaf members only. The following table contains some sample fact table data. Notice that there are entries for the lowest level expenses only.
Using Custom Rollups
You can specify how to roll up members in a hierarchy when building a parent-child dimension. In the summary of accounts hierarchy, the data for each member is aggregated according to a value in a column of the relational source.
Using Unary Operators
Custom rollup operators provide a simple way to control how member values are rolled up to their parent's values. In the previous example, the account_rollup column in the accounts dimension table contains an operator that specifies how to aggregate data. This column contains a plus sign (+) to specify adding the data to the aggregate of members at the same level, a minus sign (-) to subtract the data, or a tilde (~) to exclude the data from the aggregate. You can also use the other unary operators (/ or *) to specify a custom rollup. The value of the custom rollup operator is unique for each level member.
Using Custom Rollup Formulas
In addition, you can use a custom rollup formula to specify how to aggregate the members of a hierarchy. Custom rollup formulas use Multidimensional Expressions (MDX) to determine how the members are rolled up. Unlike using the unary operators to specify rollup, a custom rollup formula applies to all members (except calculated members) in a level.
For example, you have a time dimension used to analyze sales by quarter and year. You define the dimension using the standard Sum function to aggregate the data.
|Units Sold||Year Aggregate|
You decide that the most important figure for your business is the amount sold in the last quarter of each year. You want the aggregate at the year level to reflect only last quarter of the year. You create the following custom rollup formula at the Year level:
This custom rollup formula overrides the Sum aggregate function and produces values for the all years to reflect the sales during Quarter 4. Note that the values for the Quarter members are unchanged.
|Units Sold||Year Aggregate|
Virtual dimensions enhance the analysis and presentation of cube data. A virtual dimension is a logical dimension based on the contents of an existing physical dimension.
You build a virtual dimension when the criteria by which you want to view cube data is already contained in another dimension, and you do not want to build a new dimension hierarchy. You always have the option to build a new dimension.
The main advantages of virtual dimensions are storage savings and reduced cube processing time. The dimension members and structure are held in memory and stored on disk like a physical dimension. However, aggregation data for virtual dimensions is not stored, but is calculated in memory. When you add a virtual dimension to a cube, processing time and storage for aggregations do not increase as when you add a regular or parent-child dimension.
A virtual dimension can be based on either of the following:
- Member properties in the physical dimension
- Columns that represent the hierarchy of a physical dimension
- Other columns in the tables on which another physical dimension is based
Identifying Member Properties
You can use a virtual dimension to analyze data in a cube by the member properties in a dimension. A member property is an attribute associated with all the members in a level. For example, a Customer dimension has Country, State_Province, City, and Name levels. A level can have several member properties associated with it. For the Name level, you want to record information about the gender, marital status, and education of each customer. You can create member properties at other levels in the hierarchy. For example, you want to track the population range of cities in addition to individual customer information. You create a member property at the city level that contains a value for the following population ranges:
Up to 10,000
10,001 – 20,000
20,001 – 30,000 and so on…
This member property applies only to members at the city level.
When choosing member properties, keep in mind that there is an implicit one-to-one relationship between a member and a member property value. In the previous example, each city has only one value of the Population Range property associated with it.
Creating Member Properties
You create member properties in dimensions to:
- Contain information that an end user may want to browse. Member properties can be exposed to end users by a client application. The method of displaying member properties varies from application to application. For example, an end user can right-click a member to view its member properties. If you don't want to expose the properties, you can set the member's Visible property to FALSE. This will hide the member in cubes where it appears but you can still reference it by using an MDX statement.
- Enable calculations. For example, you can create the Store Size in SQFT as a member property of the Stores dimension. You then create a virtual dimension based on this member property. You can use the members of the Store Size in SQFT dimension to create a calculated member (measure) for a cube that displays the sales per SQFT for each store.
You can also use a member property to enable comparison of members at the same level. For example, your company manufactures several different kinds of laundry detergent. Some is in powder form, some are liquid, and some are solid tablets. The hierarchy records the package size in grams. However, you want to compare the price of the detergent based on the number of loads of laundry it can clean. You create a member property, Uses_per_Package, which records this information. You can then use the member property to build a virtual dimension. In the cube, you build a calculated member to show the price per use by dividing the package price by the number of uses.
- Enhance analysis and presentation of data by building virtual dimensions.
Enhancing Analysis with Virtual Dimensions
Virtual dimensions also allow you to display member names from a single dimension table on multiple axes in a client application without increasing cube size. Without virtual dimensions, you would need to use multiple regular or parent-child dimensions to present the information to the end user. Specifically, you can display the members of a dimension level on one axis and an associated member property on the other axis. This presentation is useful when end users want to explore the trends of measures depending on the relationship between members and member properties.
For example, a user wants to explore the relationship between Store Location and Store Type with respect to Unit Sales. The Store dimension already contains Store Country as a member of the hierarchy. You add Store type as a member property and create a virtual dimension from it. By adding the virtual dimension to the cube, the user can cross-reference Store locations and Store Types and compare the Sales at the intersections. The following report was created using this new virtual dimension.
|Unit Sales||Store Type|
|Store Country||Store State||Deluxe Supermarket||Gourmet Supermarket||Mid-Size Grocery||Small Grocery||Supermarket||Grand Total*|
|USA Total *||76837||21333||11491||6557||150555||266773|
|Grand Total *||76837||21333||11491||6557||150555||266773|
Creating Virtual Dimensions
You can create new virtual dimensions by using the Dimension Wizard or Dimension Editor.
Using the Dimension Wizard
Use the Dimension Wizard to create a virtual dimension based on member properties or the columns that represent the hierarchy of a physical dimension. The member properties of the source dimension are shown by default. If you want to base the virtual dimension on the source hierarchy, select the Display Member Keys and Names check box to list the physical dimensions keys and names for each level in the hierarchy.
Using Dimension Editor
Use Dimension Editor to create a virtual dimension based on the columns in one of the source dimension's tables. These columns do not have to be part of the physical dimension's definition (for example, the source of a level). Creating a virtual dimension is similar to creating a regular dimension. However, you must set the IsVirtual property to TRUE and set the DependsOnDimension property to the name of the source dimension.
Since virtual dimensions are based on existing dimensions, you do not need to set many of the properties for the virtual dimension and dimension levels. By definition, the following properties cannot be changed.
Analysis Services virtual dimension members are stored in multidimensional OLAP (MOLAP), just like physical dimensions.
You cannot model a ragged hierarchy by using a virtual dimension.
Analysis Services virtual dimensions are changing dimensions.
Keep in mind that all virtual dimensions have an (All) level. This is created automatically when you create a virtual dimension.
If you don't want to expose member properties to the client application, but still want to analyze data based on these attributes, create a virtual dimension based on columns in the source table rather than on member properties.
You can add a virtual dimension to a cube only if the physical dimension that supplies the member properties or columns is also included in the cube.
Creating dimensions in SQL Server 2000 Analysis Services is a slightly more complex task than creating dimensions in SQL Server 7.0 OLAP Services. However, Analysis Services provides a rich set of tools for modeling many types of dimensions and enables comprehensive analysis of your data. You should observe the following guidelines when designing dimensions:
- The choice of dimension model and the structure of data in the relational database are highly related. While you can use Analysis Services to build dimensions and cubes from many data sources, it is recommended that you perform a thorough analysis of your business requirements and build or simulate a star or snowflake schema in the relational database. Having a clear idea of your analysis goals and a relational design that reflects your multidimensional design will make it easier to recognize the types of dimensions you need to build.
- Create shared dimensions unless you have a strong business reason to use a private dimension.
- Create multiple hierarchies of a dimension rather than several independent dimensions from the same source data.
- Choose the dimension type based on the business problem you want to model, the structure of data in the tables, the existence of other dimensions, the type of hierarchy to model, the relationship of members in the hierarchy (single or multiple entity types), characteristics of members (data and member properties), rollup characteristics, and analysis requirements (presentation on multiple axes).