Skip Headers

Oracle9i Warehouse Builder Transformation Guide
Release 9.2

Part Number B12001-01
Go To Table Of Contents
Contents
Go To Index
Index

Go to previous page Go to next page

A
Using Slowly Changing Dimensions


Note:

Warehouse Builder only supports Slowly Changing Dimensions with Oracle9i Release 2 or later database servers.


This appendix provides a brief introduction to the different types of Slowly Changing Dimensions. It also goes through a case study scenario to demonstrate how to use Warehouse Builder to design and deploy different types of Slowly Changing Dimensions. For additional information, refer to books that discuss data warehousing such as The Data Warehouse Toolkit by Ralph Kimball.

This appendix contains the following topics:

About Slowly Changing Dimensions

A Slowly Changing Dimension (SCD) is a well-defined strategy to manage both current and historical data over time in a data warehouse. You must first decide which type of slowly changing dimension to use based on your business requirements. Table A-1 describes the three main types of SCDs.

Table A-1 Types of Slowly Changing Dimensions
Type Use Description Preserves History?

Type 1

Overwriting

Only one version of the dimension record exists. When a change is made, the record is overwritten and no historic data is stored.

No

Type 2

Creating Another Dimension Record

There are multiple versions of the same dimension record, and new versions are created while old versions are still kept upon modification.

Yes

Type 3

Creating a Current Value Field

There are two versions of the same dimension record: old values and current values, and old values are kept upon modification on current values.

Yes

After selecting the type of SCD, proceed with the following steps to create the dimensions:

Case Study Scenario

In order to use slowly changing dimensions, you must be using the following:

In this appendix, we will be demonstrating how to construct SCDs with the source and target systems described below. We will be using star schema to store data for all levels on the same dimension table target. This is one of the most commonly used strategy.

Source System

The geography data source table GEO_SRC will be used in our case study. Figure A-1 shows the attributes of the GEO_SRC table.

Figure A-1 GEO_SRC Table Properties

Text description of scd_src.gif follows.

Text description of the illustration scd_src.gif

Target System

The target warehouse that will be created includes the following:

A geography dimension will be used as our case study for illustration. Typically a geography dimension has two levels: city and state. A city level is the lowest level among the geography hierarchy, while a state level is the higher level. A simplified city level, shown in Figure A-2, has the following attributes:

Figure A-2 City Level Dimension Properties

Text description of scd_geo_.gif follows.

Text description of the illustration scd_geo_.gif

A simplified state level, as shown in Figure A-3, has the following attributes:

Figure A-3 State Level Dimension Properties

Text description of scd_geoa.gif follows.

Text description of the illustration scd_geoa.gif

Using Type 1 Slowly Changing Dimensions

With Type 1 SCDs, you keep no history and only store the latest value of the dimension record. Once you define the dimension GEO_DIM, you can use it in your mapping to load data into it. To load Type1 slowly changing dimensions, you extract data from the source and then directly load them into the target. GEO_SRC is the source table from which data will be loaded into the dimension GEO_DIM. Figure A-4 shows the mapping used in this example.

Figure A-4 Type 1 SCD Mapping

Text description of scd_map_.gif follows.

Text description of the illustration scd_map_.gif

Use the following steps to finish creating the Type 1 SCD:

Step 1: Populate the Surrogate Key

To ensure that unique numbers are assigned for surrogate keys for new dimension records, a sequence operator is used to map to the surrogate key column of GEO_DIM, which is CITY_ID (lowest level key).

Step 2: Configure the Target Properties

You should configure the properties for GEO_DIM operator, as shown in Figure A-5, to ensure data loads properly. First, you need to configure the loading type of GEO_DIM to be 'UPDATE/INSERT'.

Figure A-5 Mapping Dimension Properties

Text description of scd_mapa.gif follows.

Text description of the illustration scd_mapa.gif

You also need to configure each mapped column.

Step 3: Generate Code

If your target database type, which is configurable from warehouse module configuration properties, is set to Oracle9i, the MERGE feature is ensured for you when you generate code.

Using Type 2 Slowly Changing Dimensions

With Type 2 SCD, you always create another version of dimension record and mark the existing version as history. To accommodate this, you need to create extra metadata for your dimension table, including an effective date column and an expiration date column. These columns are used to differentiate a current version from a historical version as follows:

You also need to decide which columns you want to store historic data for when the values are to be changed. These columns are defined as trigger columns and should be described as part of your metadata.

Once you define your dimension GEO_DIM_TYPE2, you can use it in your mapping to load data into it. GEO_SRC is the sample source table here from which data are to be loaded into GEO_DIM_TYPE2.

To load Type 2 slowly changing dimension, you need to transform data extracted from the source properly before you load them into the target. You achieve this by creating a mapping, such as the one displayed in Figure A-6. In this mapping, data is first extracted from GEO_SRC, transformed by a series of operators, and finally loaded into GEO_DIM_TYPE2.

Figure A-6 Type 2 SCD Mapping

Text description of scd_mapb.gif follows.

Text description of the illustration scd_mapb.gif

You must be very curious about how data are actually transformed. Warehouse Builder supports all operators you would need for Type 2 slowly changing dimension. With Warehouse Builder, the whole ETL process of Type 2 slowly changing dimension can be done in one single mapping. Let us take a look at how data are transformed in a step-by-step fashion.

Use the following steps to create a Type 2 SCD:

Step 1: Detect a Match

First of all, for each source row from GEO_SRC, you need to figure out if it has matched a current dimension record in GEO_DIM_TYPE2. To do this, a Joiner is used to match GEO_SRC with GEO_DIM_TYPE2 exclusively using outer join by natural key columns as the join condition. Figure A-7 shows the expression used for this condition.

Figure A-7 Input_Row Expression

Text description of scd_join.gif follows.

Text description of the illustration scd_join.gif

Also notice that GEO_SRC should only match current dimension records in GEO_DIM_TYPE2, rather than history dimension records. To do this, you apply a filter operator to filter out history records from matching.

Step 2: Split Join Results

After Joiner, the output data are now composing both the source data rows and the matched target rows. For each output row of Joiner, you need to categorize it into the following groups:

Do this categorizing by splitting the Joiner output into OPEN_SET and CLOSE_SET groups using a Splitter.

A Joiner output row will be put into OPEN_SET group if it comes from a row in GEO_SRC that is either matching with any current version in GEO_DIM_TYPE2, or matching with no version. Do this by specifying the splitter condition for OPEN_SET group.

A Joiner output row will be put into CLOSE_SET group if both the following two condition are true:

Specify the splitter condition for CLOSE_SET group to AND the above two condition clauses.

Step 3: Determine Merge Rows

With OPEN_SET and CLOSE_SET, you compute the following two delta sets with which GEO_DIM_TYPE2 is to be loaded:

You use Expressions to accomplish both tasks. UPDATE_DELTA_ROW and MERGE_DELTA_ROW are created as two separate Expression operators from output of CLOSE_SET and OPEN_SET, respectively. The output groups of both Expression operators are then UNION by utilizing a SetOp operator, whose output row set is ready to be mapped to GEO_DIM_TYPE2 directly.

Step 4: Use the Expression UPDATE_DELTA_ROW

UPDATE_DELTA_ROW represents the row set that the final target row is to be overwritten from in order to mark a current matched version as historical. Specifically, the target expiration timestamp need be updated with current system date value. This operation is also known as to close the current version. To accomplish this, you specify the expression of attribute DATE_EXP to be SYSDATE.

For the rest of the columns, you do not need to update them such that the original target column values are specified for the corresponding expressions.

Step 5: Use the Expression MERGE_DELTA_ROW

MERGE_DELTA_ROW represents the row set that the final target row is to be overwritten from in order to:

Specifically, you need to build the expression for each final target column to differentiate between the above two scenarios by instantiating a CASE expression, i.e. 'Case When (...) Then (...) Else (...) End'. Fortunately, Warehouse Builder supports a user-friendly expression builder to accomplish this easily.

For DATE_EFF or any effective timestamp column, you specify the expression to:

Figure A-8 shows an example of how you specify the expression for DATE_EFF or any effective timestamp column.

Figure A-8 Expression for DATE_EFF

Text description of date_eff.gif follows.

Text description of the illustration date_eff.gif

For DATE_EXP or any expiration timestamp column, you specify the expression to:

Figure A-9 shows an example of how you specify the expression for DATE_EXP or any expiration timestamp column.

Figure A-9 Expression for DATE_EXP

Text description of date_exp.gif follows.

Text description of the illustration date_exp.gif

For CITY_NAME or any natural key column, you always overwrite with natural key value derived from source. Figure A-10 shows an example of how you specify the expression for CITY_NAME or any natural key column.

Figure A-10 Expression for CITY_NAME

Text description of city_nam.gif follows.

Text description of the illustration city_nam.gif

For CITY_ID_KEY or any surrogate key column, you need to preserve the surrogate key value derived from target in order to:

Figure A-11 shows an example of how you specify the expression for CITY_ID_KEY or any surrogate key column.

Figure A-11 Expression for CITY_ID_KEY

Text description of city_id_.gif follows.

Text description of the illustration city_id_.gif

For STATE_NAME or any non-trigger column, you always overwrite with the value derived from source.

For CITY_POPULATION or any trigger column, you always overwrite with the value derived from source.

Step 6: Populate Surrogate Keys

To ensure that unique numbers are assigned as surrogate keys for new dimension records, a sequence operator is used to insert the surrogate key column of GEO_DIM_TYPE2, which is CITY_ID.

The derived target surrogate key from UNION would be used to match with the final target surrogate key during loading. To achieve this, you create an additional attribute MATCHING for the final target and then map from the derived target surrogate key CITY_ID_KEY to it.

MATCHING attribute stands for the unique key of the final target that is chosen to be the matching criteria to ensure data loads properly. Here you should use the final target surrogate key column CITY_ID as MATCHING attribute. You achieve this by setting the bound name to be the same as CITY_ID:

Step 7: Configure Target Properties

Figure A-12 shows an example of how you can configure the properties for GEO_DIM_TYPE2 operator to ensure that data loads properly. First of all, you need to configure the loading type of GEO_DIM_TYPE2 to be UPDATE/INSERT.

Figure A-12 Configuration using UPDATE/INSERT

Text description of up_inser.gif follows.

Text description of the illustration up_inser.gif

You also need to configure each mapped column.

Step 8: Generate Code

If your target database type (configurable from warehouse module configuration properties) is set to Oracle9i, the MERGE feature is ensured for you when you generate code.

Using Type 3 Slowly Changing Dimension

With Type 3 SCD; you create a current value field to keep the current value of dimension record apart from its previous value. To achieve this, you need to create two columns for each data field, one for current value and the other for keeping previous value, respectively.

Once you define your dimension GEO_DIM_TYPE3, you can use it in your mapping to load data into it. GEO_SRC is the sample source table here from which data are to be loaded into GEO_DIM_TYPE3.

To load Type 3 slowly changing dimension, you extract data from the source and then transform them before directly load them into the target. You achieve this by the following mapping graph where data are first extracted from GEO_SRC, transformed by a series of operators, and finally loaded into GEO_DIM_TYPE3. Figure A-13 shows an example of this mapping.

Figure A-13 Type 3 SCD Mapping

Text description of scd_mapc.gif follows.

Text description of the illustration scd_mapc.gif

Use the following steps to create a Type 3 SCD:

Step 1: Detect a Match

First of all, for each source row from GEO_SRC, you need to figure out if it has matched a current dimension record in GEO_DIM_TYPE3. To do this, a Joiner is used to match GEO_SRC with GEO_DIM_TYPE3 exclusively (using outer join) by natural key columns as the join condition.

Step 2: Populate Current Values

For Type 3 SCD, you always overwrite current value columns of the target with that of the source. You accomplish this by creating mapping lines from Joiner output directly into the target, GEO_DIM_TYPE3.

Step 3: Populate Previous Value Columns by Expression

For Type 3 SCD, it matters to you when and how to overwrite previous value columns of the target, including CITY POPULATION_OLD, CITY STATE_BUDGET_OLD, and CITY STATE_NAME_OLD.

Specifically you need to:

To achieve this, build an Expression from the previous Joiner result and instantiate the expression using a CASE expression. Figure A-14 shows an example of how to instantiate the expression using a CASE expression.

Figure A-14 Case Expression

Text description of case.gif follows.

Text description of the illustration case.gif

Step 4: Populate Surrogate Keys

This is similar to what you have done for Type 1 SCDs. Refer to "Step 1: Populate the Surrogate Key" for details.

Step 5: Configure Target Properties

This is also similar to what you have done for Type 1 SCDs.

Step 6: Generate Code

If your target database type (configurable from warehouse module configuration properties) is set to Oracle9i, the MERGE feature is ensured for you when you generate code.

Deploying and Loading Slowly Changing Dimensions

Once you have constructed dimensions and mappings, you proceed to deploy and execute them through Deployment Manager. Run mappings using set-based mode with an Oracle9i database as your target to ensure optimal performance.


Go to previous page Go to next page
Oracle
Copyright © 2003 Oracle Corporation.

All Rights Reserved.
Go To Table Of Contents
Contents
Go To Index
Index