| Oracle9i Warehouse Builder Transformation Guide Release 9.2 Part Number B12001-01 |
|
This appendix provides a brief introduction to the different types of Slowly Changing Dimensions. It also goes through a case study scenario to demonstrate how to use Warehouse Builder to design and deploy different types of Slowly Changing Dimensions. For additional information, refer to books that discuss data warehousing such as The Data Warehouse Toolkit by Ralph Kimball.
This appendix contains the following topics:
A Slowly Changing Dimension (SCD) is a well-defined strategy to manage both current and historical data over time in a data warehouse. You must first decide which type of slowly changing dimension to use based on your business requirements. Table A-1 describes the three main types of SCDs.
After selecting the type of SCD, proceed with the following steps to create the dimensions:
In order to use slowly changing dimensions, you must be using the following:
In this appendix, we will be demonstrating how to construct SCDs with the source and target systems described below. We will be using star schema to store data for all levels on the same dimension table target. This is one of the most commonly used strategy.
The geography data source table GEO_SRC will be used in our case study. Figure A-1 shows the attributes of the GEO_SRC table.
The target warehouse that will be created includes the following:
A geography dimension will be used as our case study for illustration. Typically a geography dimension has two levels: city and state. A city level is the lowest level among the geography hierarchy, while a state level is the higher level. A simplified city level, shown in Figure A-2, has the following attributes:
A simplified state level, as shown in Figure A-3, has the following attributes:
With Type 1 SCDs, you keep no history and only store the latest value of the dimension record. Once you define the dimension GEO_DIM, you can use it in your mapping to load data into it. To load Type1 slowly changing dimensions, you extract data from the source and then directly load them into the target. GEO_SRC is the source table from which data will be loaded into the dimension GEO_DIM. Figure A-4 shows the mapping used in this example.
Use the following steps to finish creating the Type 1 SCD:
To ensure that unique numbers are assigned for surrogate keys for new dimension records, a sequence operator is used to map to the surrogate key column of GEO_DIM, which is CITY_ID (lowest level key).
You should configure the properties for GEO_DIM operator, as shown in Figure A-5, to ensure data loads properly. First, you need to configure the loading type of GEO_DIM to be 'UPDATE/INSERT'.
You also need to configure each mapped column.
If your target database type, which is configurable from warehouse module configuration properties, is set to Oracle9i, the MERGE feature is ensured for you when you generate code.
With Type 2 SCD, you always create another version of dimension record and mark the existing version as history. To accommodate this, you need to create extra metadata for your dimension table, including an effective date column and an expiration date column. These columns are used to differentiate a current version from a historical version as follows:
You also need to decide which columns you want to store historic data for when the values are to be changed. These columns are defined as trigger columns and should be described as part of your metadata.
Once you define your dimension GEO_DIM_TYPE2, you can use it in your mapping to load data into it. GEO_SRC is the sample source table here from which data are to be loaded into GEO_DIM_TYPE2.
To load Type 2 slowly changing dimension, you need to transform data extracted from the source properly before you load them into the target. You achieve this by creating a mapping, such as the one displayed in Figure A-6. In this mapping, data is first extracted from GEO_SRC, transformed by a series of operators, and finally loaded into GEO_DIM_TYPE2.
You must be very curious about how data are actually transformed. Warehouse Builder supports all operators you would need for Type 2 slowly changing dimension. With Warehouse Builder, the whole ETL process of Type 2 slowly changing dimension can be done in one single mapping. Let us take a look at how data are transformed in a step-by-step fashion.
Use the following steps to create a Type 2 SCD:
First of all, for each source row from GEO_SRC, you need to figure out if it has matched a current dimension record in GEO_DIM_TYPE2. To do this, a Joiner is used to match GEO_SRC with GEO_DIM_TYPE2 exclusively using outer join by natural key columns as the join condition. Figure A-7 shows the expression used for this condition.
Also notice that GEO_SRC should only match current dimension records in GEO_DIM_TYPE2, rather than history dimension records. To do this, you apply a filter operator to filter out history records from matching.
After Joiner, the output data are now composing both the source data rows and the matched target rows. For each output row of Joiner, you need to categorize it into the following groups:
Do this categorizing by splitting the Joiner output into OPEN_SET and CLOSE_SET groups using a Splitter.
A Joiner output row will be put into OPEN_SET group if it comes from a row in GEO_SRC that is either matching with any current version in GEO_DIM_TYPE2, or matching with no version. Do this by specifying the splitter condition for OPEN_SET group.
A Joiner output row will be put into CLOSE_SET group if both the following two condition are true:
Specify the splitter condition for CLOSE_SET group to AND the above two condition clauses.
With OPEN_SET and CLOSE_SET, you compute the following two delta sets with which GEO_DIM_TYPE2 is to be loaded:
You use Expressions to accomplish both tasks. UPDATE_DELTA_ROW and MERGE_DELTA_ROW are created as two separate Expression operators from output of CLOSE_SET and OPEN_SET, respectively. The output groups of both Expression operators are then UNION by utilizing a SetOp operator, whose output row set is ready to be mapped to GEO_DIM_TYPE2 directly.
UPDATE_DELTA_ROW represents the row set that the final target row is to be overwritten from in order to mark a current matched version as historical. Specifically, the target expiration timestamp need be updated with current system date value. This operation is also known as to close the current version. To accomplish this, you specify the expression of attribute DATE_EXP to be SYSDATE.
For the rest of the columns, you do not need to update them such that the original target column values are specified for the corresponding expressions.
MERGE_DELTA_ROW represents the row set that the final target row is to be overwritten from in order to:
Specifically, you need to build the expression for each final target column to differentiate between the above two scenarios by instantiating a CASE expression, i.e. 'Case When (...) Then (...) Else (...) End'. Fortunately, Warehouse Builder supports a user-friendly expression builder to accomplish this easily.
For DATE_EFF or any effective timestamp column, you specify the expression to:
Figure A-8 shows an example of how you specify the expression for DATE_EFF or any effective timestamp column.
For DATE_EXP or any expiration timestamp column, you specify the expression to:
Figure A-9 shows an example of how you specify the expression for DATE_EXP or any expiration timestamp column.
For CITY_NAME or any natural key column, you always overwrite with natural key value derived from source. Figure A-10 shows an example of how you specify the expression for CITY_NAME or any natural key column.
For CITY_ID_KEY or any surrogate key column, you need to preserve the surrogate key value derived from target in order to:
Figure A-11 shows an example of how you specify the expression for CITY_ID_KEY or any surrogate key column.
For STATE_NAME or any non-trigger column, you always overwrite with the value derived from source.
For CITY_POPULATION or any trigger column, you always overwrite with the value derived from source.
To ensure that unique numbers are assigned as surrogate keys for new dimension records, a sequence operator is used to insert the surrogate key column of GEO_DIM_TYPE2, which is CITY_ID.
The derived target surrogate key from UNION would be used to match with the final target surrogate key during loading. To achieve this, you create an additional attribute MATCHING for the final target and then map from the derived target surrogate key CITY_ID_KEY to it.
MATCHING attribute stands for the unique key of the final target that is chosen to be the matching criteria to ensure data loads properly. Here you should use the final target surrogate key column CITY_ID as MATCHING attribute. You achieve this by setting the bound name to be the same as CITY_ID:
Figure A-12 shows an example of how you can configure the properties for GEO_DIM_TYPE2 operator to ensure that data loads properly. First of all, you need to configure the loading type of GEO_DIM_TYPE2 to be UPDATE/INSERT.
You also need to configure each mapped column.
If your target database type (configurable from warehouse module configuration properties) is set to Oracle9i, the MERGE feature is ensured for you when you generate code.
With Type 3 SCD; you create a current value field to keep the current value of dimension record apart from its previous value. To achieve this, you need to create two columns for each data field, one for current value and the other for keeping previous value, respectively.
Once you define your dimension GEO_DIM_TYPE3, you can use it in your mapping to load data into it. GEO_SRC is the sample source table here from which data are to be loaded into GEO_DIM_TYPE3.
To load Type 3 slowly changing dimension, you extract data from the source and then transform them before directly load them into the target. You achieve this by the following mapping graph where data are first extracted from GEO_SRC, transformed by a series of operators, and finally loaded into GEO_DIM_TYPE3. Figure A-13 shows an example of this mapping.
Use the following steps to create a Type 3 SCD:
First of all, for each source row from GEO_SRC, you need to figure out if it has matched a current dimension record in GEO_DIM_TYPE3. To do this, a Joiner is used to match GEO_SRC with GEO_DIM_TYPE3 exclusively (using outer join) by natural key columns as the join condition.
For Type 3 SCD, you always overwrite current value columns of the target with that of the source. You accomplish this by creating mapping lines from Joiner output directly into the target, GEO_DIM_TYPE3.
For Type 3 SCD, it matters to you when and how to overwrite previous value columns of the target, including CITY POPULATION_OLD, CITY STATE_BUDGET_OLD, and CITY STATE_NAME_OLD.
Specifically you need to:
To achieve this, build an Expression from the previous Joiner result and instantiate the expression using a CASE expression. Figure A-14 shows an example of how to instantiate the expression using a CASE expression.
This is similar to what you have done for Type 1 SCDs. Refer to "Step 1: Populate the Surrogate Key" for details.
This is also similar to what you have done for Type 1 SCDs.
If your target database type (configurable from warehouse module configuration properties) is set to Oracle9i, the MERGE feature is ensured for you when you generate code.
Once you have constructed dimensions and mappings, you proceed to deploy and execute them through Deployment Manager. Run mappings using set-based mode with an Oracle9i database as your target to ensure optimal performance.
|
|
![]() Copyright © 2003 Oracle Corporation. All Rights Reserved. |
|