| Oracle9i Warehouse Builder User's Guide Release 9.2 Part Number B10996-01 |
|
This chapter provides details on how to use operators in a mapping to transform data and how to use the Expression Builder to create expressions. Operators are listed alphabetically.
This chapter includes the following topics listed in a columnar table that reads down the columns from left to right to conserve space:
|
N/A |
Some of the data flow operators described in this chapter require that you create expressions. An expression is a statement or clause that transforms data or specifies a restriction. These expressions are portions of SQL that are used inline as part of a SQL statement. Each expression belongs to a type that is determined by the role of the data flow operator. You can create expressions using Expression Builder, or by typing them into the expression field located in the operator or attribute property windows.
You can open the Expression Builder from the operator property windows in the operators such as filters, joiners, splitters, and aggregators.
You can open the Expression Builder from the attribute property windows in the operators such as expressions, data generators, and constants.
To open the Expression Builder:
The Expression Builder displays as shown in Figure 8-2.
This verifies the accuracy of the expression syntax.
The Expression Builder contains the following parts:
Beginning in Oracle 9i, the CASE function is recommended over the DECODE function because the CASE function generates both SQL and PL/SQL while DECODE is limited to SQL. If you use the DECODE function in an expression, Warehouse Builder promotes it to CASE where appropriate during code generation. This allows you to deploy the DECODE functionality in all operating modes (setbased, rowbased, etc.) and transparently across Oracle database releases (8.1, 9.0 and higher).
For example, Warehouse Builder converts the function
DECODE (T1.A, 1, `ABC', 2, `DEF', 3, `GHI', `JKL')
to the following:
CASE T1.A WHEN 1 THEN `ABC' WHEN 2 THEN `DEF' WHEN 3 THEN `GHI' ELSE `JKL'
The Aggregator operator calculates data aggregations, such as summations and averages and provides an output row set with the aggregated data.
Because each Aggregator operator shares a GROUP BY and HAVING clause, each attribute in the output group has the same cardinality. The number of rows in the output row set is less than or equal to the number of input rows.
The Aggregator operator has one input group and one output group. Connecting the source to the input group produces the corresponding aggregated row set in the output group. The resulting output includes the column used by the GROUP BY or the HAVING clause.
The Aggregator operator contains the following properties:
A DISTINCT result removes all duplicate rows before the average is calculated, ensuring a correct result in the above example.
An ALL result returns an average value on all rows.
If no aggregation function is necessary, specify NONE for the function. Specifying NONE on the attribute aggregation automatically adds the attribute to the resulting GROUP BY function.
Warehouse Builder displays the Operator Editor.
The Expression dialog displays as shown in Figure 8-3.
To define the Group By Clause:
The Group By Clause dialog displays as shown in Figure 8-4.
sum(INGRP1.OR_TOTAL) > 10000 as shown in Figure 8-5.
The Constant operator enables you to define constants. Constants are initialized at the beginning of the execution of the mapping. Constant values can be used anywhere in a mapping.
The Constant operator produces a single output group that can contain one or more constant attributes. For any defined constant data type, the output expression must be a valid SQL expression returning a value of the same data type. For VARCHAR, CHAR, or VARCHAR2 data types, you must enclose constant string literals within single quotes. For example, `my_string'.
The Constant operator contains the following property:
To use a constant operator in a mapping:
Warehouse Builder displays the Operator Editor.
The Attributes properties window displays.
Use a Data Generator operator to provide information such as record number, system date, and sequence values.
|
Recommendation: For PL/SQL mappings use a Constant Operator or Mapping Sequence Operator instead of a Data Generator. |
For mappings with flat file sources, the Data Generator operator also provides a place to enter constant information. For mappings with Flat File sources and targets, the Data Generator operator connects the mapping to SQL*Loader to generate the data stored in the database record.
The following functions are available:
It is possible for Warehouse Builder to generate data by specifying only sequences, record numbers, system dates, and constants as field specifications. SQL*Loader inserts as many records as are specified by the LOAD keyword.
Use the RECNUM keyword to set an attribute to the number of the records that the record was loaded from. Records are counted sequentially from the beginning of the first data file, starting with record 1. RECNUM increments as each logical record is assembled. It increments for records that are discarded, skipped, rejected, or loaded. For example, if you use the option SKIP=10, the first record loaded has a RECNUM of 11.
A column specified with SYSDATE gets the current system date, as defined by the SQL language SYSDATE function.
The target column must be of type CHAR or DATE. If the column is of type CHAR, then the date is loaded in the format dd-mon-yy. After the load, you only access it in that format. If the system date is loaded into a DATE column, then you can only access it in a variety of formats including the time and the date. A new system date/time is used for each array of records inserted in a conventional path load and for each block of records loaded during a direct path load.
The SEQUENCE keyword ensures a unique value for a column. SEQUENCE increments for each record that is loaded or rejected. It does not increment for records that are discarded or skipped.
The combination of column name and the SEQUENCE function is a complete column specification. Table 8-1 lists the options available for sequence values.
If a record is rejected because of a format error or an Oracle error, the generated sequence numbers are not reshuffled to mask this. For example, if four rows are assigned sequence numbers 10, 12, 14, and 16 in a column, and the row with 12 is rejected, the three rows inserted are numbered 10, 14, and 16, not 10, 12, 14. The sequence of inserts is preserved despite data errors. When you correct the rejected data and reinsert it, you can manually set the columns to match the sequence.
Although the Data Generator operator has only one output group, it has predefined attributes corresponding to Record Number, System Date, and a typical Sequence. While modification of these attributes is not recommended, you can create new attributes. The Data Generator operator is only valid for a SQL*Loader mapping.
The Data Generator attribute contains the following property:
Warehouse Builder displays the Operator Editor.
Warehouse Builder displays the pre-defined output attributes RECNUM, SYS_DATE, and SEQUENCE.
Warehouse Builder displays the Attribute Properties window.
The Deduplicator enables you to remove duplicate data in a source by placing a DISTINCT clause in the select code represented by the mapping.
To remove duplicates:
The Expression operator enables you to write SQL expressions that define non-procedural algorithms for one output parameter of the operator.
The expression text can contain combinations of input parameter names, variable names, and library functions. Use the Expression operator to transform the column value data of rows within a row set using SQL-type expressions, while preserving the cardinality of the input row set. To create these expressions, open the Attribute properties window for the output attribute and then open the Expression Builder.
The Expression operator contains only one input group and one output group. These are created automatically when you drop the operator onto the Mapping Editor canvas.
The output expressions for this operator cannot contain any aggregation functions. To use aggregation functions, use the Aggregator operator. See "Aggregator Operator".
Every output attribute in an Expression operator contains the following properties:
Warehouse Builder copies the input attributes into the operator.
Warehouse Builder displays the Operator Editor.
You can conditionally filter out rows using the Filter operator.
The Filter operator filters data from a source to a target by placing a WHERE clause in the code represented by the mapping. You connect a source operator to the Filter operator, apply a filter condition, and send a subset of rows to the next operator.
A Filter operator has only one input/output group that can be connected to both a source and target row set. The resulting row set is a filtered subset of the source row set based on a boolean filter condition expression.
The Filter operator contains the following property:
When you generate a mapping that includes a Filter operator, the Code Viewer displays filter condition expressions as a WHERE clause for set-based view mode. The filter input names in the original filter condition are replaced by actual column names from the source table, qualified by the source table alias.
To use a filter operator in a mapping:
The Filter Properties window displays.
You can use the Joiner operator to join multiple row sets from different sources with different cardinalities, and produce a single output row set.
The Joiner operator uses a boolean condition that relates column values in each source row set to at least one other row set.
If the input row sets are related through foreign keys, that relationship is used to form a default join condition. You can use this default condition or you can modify it. If the sources are not related through foreign keys, then you must define a join condition.
If the default foreign keys result in duplicate WHERE clauses, the Joiner operator will remove the duplicate clauses. This can happen if the join condition references several foreign keys. For example, if table T1 has a foreign key FK1 point to unique key UK1 in table T2 and table T2 has a foreign key FK2 pointing to unique key UK2 in T1, the resulting join condition
T1.A = T2.A AND T1.B = T2.B /*All instances of FK1 -> UK1 are reduced to one where clause*/ AND T2.B = T1.B AND T2.C = T1.C /*All instances of FK2 -> UK2 are reduced to one where clause*/
is generated by the Joiner operator as
T2.A = T2.A AND T1.B = T2.B AND T2.C = T1.C
The Joiner operator contains the following properties:
Attributes in the Joiner operator contain the following properties:
The output attributes are created with data types matching the corresponding input data types.
The Joiner properties window displays.
The join condition expression cannot contain aggregation functions, such as SUM. Compile errors result when deploying the generated code for the mapping. A Joiner can have an unlimited number of input groups but only one output group.
The order of input groups in a joiner is used as the join order. The major difference between ANSI join and an Oracle join is that ANSI join must clearly specify join order. An Oracle join does not require it.
The filter condition is applied after join. For example, consider the following join:
Input1.c --- + Input2.c --- +---> Joiner Input3.c --- +
with the following conditions:
The first two conditions are true joins while the third is a filter condition. If ANSI code is to be generated, Warehouse Builder interprets the statement as
select ... from Input1 full outer join Input2 on (Input1.c = Input2.c) join Input3 on (Input2.c = Input3.c) where Input1.c is not null;
If your target warehouse is based on Oracle9i or a later version, the Warehouse Builder joiner also supports the full outer join. To specify a full outer join condition, you must place the (+) sign on both sides of a relational operator. For example,
T1.A (+) = T2.B (+)
The results of the full outer join are as follows:
When using the Oracle SQL syntax for partial outer join such as T1.A = T2.B (+), if you place a (+) sign on both sides of the relational operator, it is invalid Oracle SQL syntax. However, Warehouse Builder translates any condition with the double (+) sign into ANSI SQL syntax. For example,
SELECT ... FROM T1 FULL OUTER JOIN T2 ON (T1.A = T2.B);
SELECT ... FROM T1, T2 WHERE T1.A = T2.B;
T1.A (+) = T2.A (+) and T2.A = T3.A
Warehouse Builder generates the following:
SELECT ... FROM T1 FULL OUTER JOIN T2 ON (T1.A=T2.A) JOIN T3 ON (T2.A=T3.A);
If you create input groups in another order, such as T1, T3, T2. Warehouse Builder will generate the following:
SELECT ... FROM T1 JOIN T3 ON (1=1) JOIN T2 ON (T1.A=T2.A and T2.A=T3.A);
When T1 and T3 are joined, there is no join condition specified. Warehouse Builder fills in a condition 1=1 (essentially a boolean true) and the two conditions you specified are used to join T2.
T1.A(+) = T2.A(+) and T1.B = T2.B
Warehouse Builder will generate a join statement instead of a full outer join because T1.B = T2.B is stronger than the full outer join condition between T1 and T2.
T1.A (+) = T2.A (+) amd T2.B = T3.B (+) will cause validation errors and code will not be generated.
In an equijoin, key values from the two tables must match. In a full outer join, key values are matched and nulls are created in the resulting table for key values that cannot be matched. A left or a right outer join retains all rows in the specified table.
In Oracle8i, you create an outer join in SQL using the join condition variable (+):
SELECT ...FROM A, BWHERE A.key = B.key (+);
This example is a left outer join. Rows from table A are included in the joined result even though no rows from table B match them. To create a full outer join in Oracle8i, you must use multiple SQL statements.
The Expression Builder allows the following syntax for a full outer join:
TABLE1.COL1 (+) = TABLE2.COL2 (+)
This structure is not supported by Oracle8i. Oracle9i is ANSI SQL 1999 compliant. The ANSI SQL 1999 standard includes a solution syntax for performing full outer joins. The code generator translates the above expression into an ANSI SQL 1999 full outer join statement, similar to:
SELECT ... FROM table1 FULL OUTER JOIN table2 ON (table1.col1 = table2.col2)
Because the full outer join statement complies to ANSI SQL 1999, it is only valid if the generated code is deployed to an Oracle9i database. Specifying a full outer join to an Oracle8i database results in a validation error.
A full outer join and a partial outer join can be used together in a single SQL statement, but it must in an AND or an AND/OR condition. If a full outer join and partial outer join are used in the OR condition, an unexpected AND condition will result. For example,
SELECT ... FROM table1 FULL OUTER JOIN table2 ON (A = B or C = D)
is evaluated by Oracle Server as A (+) = B (+) AND C = D.
Use the Key Lookup operator to lookup data from a table, view, cube, or dimension.
For example, you can use the Key Lookup operator when you define a mapping that loads a cube and when you define surrogate keys on the dimension. In this example, you create a Key Lookup operator that looks up the surrogate key in the dimension table and returns the corresponding original record to form the foreign key relationship.
The table, view, cube, or dimension is bound to the Key Lookup operator. You can have multiple Key Lookup operators in the same mapping.
The key that you look up can be any unique value. It does not need to be a primary or unique key, as defined in an RDBMS. The Key Lookup operator reads data from a lookup table using the key input you supply and finds the matching row. This operator returns a row for each input key.
The output of the Key Lookup operator corresponds to the columns in the lookup object. If multiple rows in the lookup table match the key inputs, the cardinality of the output differs from the input. This produces results inconsistent with the data flowing into the target operator and generates an error at runtime. To ensure that only a single lookup row is found for each key input row, use keys in your match condition.
You can use inbound reconciliation on Key Lookup outputs. Outbound reconciliation is disabled. See "Reconciling Operators and Repository Objects" beginning for more information.
Each output attribute for the key lookup has a property called DEFAULT VALUE. The DEFAULT VALUE property is used instead of NULL in the outgoing row set if no value is found in the lookup table for an input value. The generated code uses the NVL function. The Key Lookup always results in an outer-join statement.
When you validate this operator:
Warehouse Builder displays the Add Mapping Key Lookup dialog.
Warehouse Builder adds to the mapping canvas one Key Lookup operator for each table you select. For more information on using the Add Mapping Key Lookup dialog, see "Adding Bindable Operators".
Warehouse Builder displays the Key Lookup Properties window.
The Lookup Condition dialog displays as shown in Figure 8-9.
Choose attributes to compare to the selected lookup table column.
For a non-composite key, select an input attribute or a key from the drop-down list. Click Add to List. The column or input pairs are added to the table at the bottom of the main dialog.
For a composite key, click Add to List. The Match Key Columns to Input for Key dialog displays. Select the key input attributes from the Input Attribute drop-down list. Click OK.
Mapping Advanced Queue (AQ) operators are used to propagate messages in queues from source to target systems.
In Warehouse Builder, you can use AQs as sources or targets. You can map an AQ source object to a target table or a staging table and then deploy the mapping to your target database. When you run the AQ mapping, the changes are propagated from the AQ to the target table in your warehouse. After a mapping runs successfully, subsequent invocations of the same mapping will not see the messages that have already been processed. If the mapping fails, no messages will be dequeued from the source AQs.
Warehouse Builder also enables you to use AQs as targets in your warehouse. Source AQs in your repository may represent a central AQ that integrates data from different messaging systems or applications. Warehouse Builder enables you to map this source AQ to a target AQ and propagate the messages from one type of messaging system to another.
For more information, see:
To use a Mapping Advanced Queue operator in a mapping:
The Add Mapping Advanced Queue dialog displays.
Each AQ bound to an AQ operator in a mapping must belong to the same warehouse module as the mapping. If not, you will receive an error when you validate the mapping.
For more information on how to use the Add Mapping Advanced Queue dialog, see "Adding Bindable Operators".
A local company assigns a customer service representative to every customer by region. As the customer base increases, the company hires new service representatives to manage new and existing customers. The new service representative assignments to customers are tracked by using queues. You create an AQ with messages capturing the changes made to the Customer table.
You can import this AQ into Warehouse Builder and use it as a data source in a mapping. Connect the AQ source to the Customer target table on the mapping editor.Warehouse Builder generates PL/SQL scripts for this mapping, which you can deploy to your target warehouse. When you run the mapping, the messages are dequeued and the new assignments of the service representatives in the source Customer table are propagated to the Customer table in the target database.
If you register each mapping in Warehouse Builder as a separate subscriber of each AQ, then different mappings referencing the same AQ do not interfere with each other. Each mapping sees all the messages in the AQ published to all its subscribers.
You can perform an inbound reconciliation on an AQ operator to update it with any changes in the repository definition of the AQ object to which it is bound. For more information, see "Inbound Reconciliation". Warehouse Builder does not currently enable you to outbound reconcile an AQ operator.
You can configure the following properties for a Mapping AQ operator used in a mapping:
Warehouse Builder displays the Mapping Advanced Queue Properties dialog containing the Bound Name property. This name is used to identify the AQ during code generation.
Warehouse Builder displays the Mapping Advanced Queue Editor containing three tabs: General, Groups, Input/Output.
Rename the Mapping AQ operator by highlighting the previous name and typing over it. Type an optional description for the Mapping AQ operator.
Rename the Mapping AQ operator group by highlighting the previous name and typing over it. Each AQ operator has exactly one INOUT group. This is a read-only field. Type an optional description for the Mapping AQ operator group.
You cannot add more attributes to the INOUT group of an AQ operator. You can view and edit the attribute name, datatype, length, precision, scale, and optional description.
AQs deployed from Warehouse Builder are proxy AQs that represent the AQ source on a remote system. Because Oracle does support remote dequeuing of messages, you need to first deploy an agent of the source AQ in your local schema and register the locally deployed proxy AQ as a subscriber to the AQ on the source system. For the Warehouse Builder AQ mappings to run correctly, you must follow these steps in your source and warehouse systems:
execute on dbms_aq execute on dbms_aqadm execute on aq$_agent
execute on dbms_aq execute on dbms_aqadm execute on aq$_agent
The target user can now deploy the target proxy AQ to the target instance.
If an AQ is defined in your source system, you need to follow these manual steps to register the deployed proxy AQ as a subscriber to the source AQ and schedule propagation of the AQ to your target warehouses:
The following is an example of how you can register the AQ as a subscriber and schedule its propagation to the target system.
declare subscriber sys.aq$_agent; begin subscriber := sys.aq$_agent('subscribername','schema.queue2',0); -- schema.queue2 refers to the queue that is subscribing dbms_aqadm.add_subscribe(queue_name=>'QUEUE1',subscriber=>subscriber,rule=>' ',transformation=>''); -- QUEUE1 refers to the queue that 'queue2 is subscribing to dbms_aqadm.schedule_propagation('SCHEMA.QUEUE1',NULL,SYSDATE,'','','60'); -- SCHEMA.QUEUE1 are the schema and queuename of queue1. This command starts the message propagation from queue1 to queue2 with a latency of 60 seconds end;
You can use a Mapping Flat File operator as either a source or target.
However, the two are mutually exclusive within the same mapping. There are differences in code generation languages for flat file sources and targets. Subsequently, mappings can contain a mix of flat files, relational objects, and transformations, but with the restrictions discussed further in this section.
As a source, the Mapping Flat File operator acts as the row set generator that reads from a flat file using the SQL*Loader utility. You cannot use a flat file source operator if you are mapping to a flat file target or to an external table. When you design a mapping with a flat file source, you can use the following operators:
When you use a flat file as a source in a mapping, remember to create a connector from the flat file source to the relational target for the mapping to deploy successfully.
A mapping with a flat file target generates a PL/SQL package that loads data into a flat file instead of loading data into rows in a table. You can either use an existing flat file, or identify a new flat file as a target during design and define it using outbound reconciliation. See "Outbound Reconciliation".
When you use a flat file as a target, understanding both flat files and external tables will help you determine which feature to use. If you are loading large volumes of data, loading to a flat file allows you to use the DIRECT PATH SQL*Loader option, which results in better performance. If you are not loading large volumes of data, you can benefit from many of the relational transformations available in the external table feature. Refer to "External Tables versus Flat File Operators" for more information.
If you use a multiple-record-type flat file as a target, you can only map to one of the record types. If you want to load all of the record types in the flat file from the same source, you can either drop the same flat file into the mapping as a target again and map to a different record type, or create a separate mapping for each record type you want to load.
You have the following options for Mapping Flat File operators:
This scenario describes using a flat file object that has been previously imported and sampled. You can find instructions for importing and sampling flat file objects in Chapter 4, "Importing Data Definitions".
To use a previously imported flat file as a source or a target:The Add File dialog displays.
In this option, you use a flat file as either a source or target, but you have not yet imported or sampled the flat file. You import the flat file as you design your mapping.
To import and bind a new flat file into your mapping:The Add File dialog displays.
The Import File Wizard Welcome page displays. For more information on importing Flat Files, see "Using the Import Metadata Wizard Flat Files".
The Import File Wizard Object Selection page displays.
The Import File Wizard Summary and Import page displays. A red ball next to the file indicates that there is no metadata information available about the file structure.
The Mapping Flat File operator appears on the mapping canvas.
As you build your mapping, you can create a new flat file object by selecting the Create unbound mapping file with no attributes option. Using this method, the only option is to create a delimited, comma-separated, single-record-type flat file operator. You can leave the flat file unbound, or you can outbound reconcile to the repository.
|
Note: You can only perform outbound reconciliation if the flat file is new to that repository. For more information on reconciliation, refer to "Reconciling Operators and Repository Objects". |
The Add File dialog displays.
The Mapping Flat File operator with no attributes appears in your mapping.
For instructions on defining attributes, refer to "Editing Operators".
A new comma-separated flat file is created in your repository.
If you are extracting data from a multiple-record-type file with a master-detail structure and mapping to tables, add a Mapping Sequence operator to the mapping to retain the relationship between the master and detail records through a surrogate primary key or foreign key relationship. A master-detail file structure is one where a master record is followed by its detail records. In Example 8-1, records beginning with "E" are master records with Employee information and records beginning with "P" are detail records with Payroll information for the corresponding employee.
E003715 4 153 09061987 014000000 "IRENE HIRSH" 1 08500 P 01152000 01162000 00101 000500000 000700000 P 02152000 02162000 00102 000300000 000800000E003941 2 165 03111959 016700000 "ANNE FAHEY" 1 09900 P 03152000 03162000 00107 000300000 001000000E001939 2 265 09281988 021300000 "EMILY WELLMET" 1 07700 P 01152000 01162000 00108 000300000 001000000 P 02152000 02162000 00109 000300000 001000000
In Example 8-1, the relationship between the master and detail records is inherent only in the physical record order: payroll records correspond to the employee record they follow. However, if this is the only means of relating detail records to their masters, this relationship is lost when Warehouse Builder loads each record into its target table.
You can maintain the relationship between master and detail records if both types of records share a common field. If Example 8-1 contains a field Employee ID in both Employee and Payroll records, you can use it as the primary key for the Employee table and as the foreign key in the Payroll table, thus associating Payroll records to the correct Employee record.
However, if your file does not have a common field that can be used to join master and detail records, you must add a sequence column to both the master and detail targets (see Table 8-2 and Table 8-3) to maintain the relationship between the master and detail records. Use the Mapping Sequence operator to generate this additional value.
Table 8-2 represents the target table containing the master records from the file in Example 8-1. The target table for the master records in this case contains employee information. Columns E1-E10 contain data extracted from the flat file. Column E11 is the additional column added to store the master sequence number. Notice that the number increments by one for each employee.
Table 8-3 represents the target table containing the detail records from the file in Example 8-1. The target table for the detail records in this case contains payroll information, with one or more payroll records for each employee. Columns P1-P6 contain data extracted from the flat file. Column P7 is the additional column added to store the detail sequence number. Notice that the number for each payroll record matches the corresponding employee record in Table 8-2.
This section contains instructions on creating a mapping that extracts records from a master-detail flat file and loads those records into two different tables. One target table stores master records and the other target table stores detail records from the flat file. The Mapping Sequence is used to maintain the master-detail relationship between the two tables.
|
Note: These instructions are for conventional path loading. For instructions on using direct path loading for master-detail records, see "Direct Path Loading for Performance". |
This procedure outlines general steps for building such a mapping. Additional detailed instructions are available:
When naming the record types as you sample the file, assign descriptive names to the master and detail records, as shown in Figure 8-11. This makes it easier to identify those records in the future.
Figure 8-11 shows the Flat File Sample Wizard for a multiple-record-type flat file containing department and employee information. The master record type (for employee records) is called EmployeeMaster, while the detail record type (for payroll information) is called PayrollDetail.
You can either select an existing repository table that you created earlier or create a new unbound mapping table with no attributes. You can then map or copy all required fields from the master record of the file operator to the master table operator (creating columns) and perform an outbound reconciliation to define the table later.
The table must contain all the columns required for the master fields you want to load plus an additional numeric column for loading sequence values, as shown in Figure 8-12.
You can either select an existing repository table that you created earlier or create a new unbound mapping table with no attributes. You can then map or copy all required fields from the master record of the file operator to the master table operator (creating columns) and perform an outbound reconcile to define the table later.
The table must contain all the columns required for the detail fields you want to load plus an additional numeric column for loading sequence values.
NEXTVAL attribute to the additional sequence column in the master table, as shown in Figure 8-13.
CURRVAL attribute to the additional sequence column in the detail table, as shown in Figure 8-13.
Figure 8-13 shows a completed mapping with the flat file master fields mapped to the master target table, the detail fields mapped to the detail target table, and the NEXTVAL and CURRVAL attributes from the Mapping Sequence mapped to the master and detail target tables, respectively.
Direct Mode: False
Errors Allowed: 0
Row: 1
Trailing Nullcols: True (for all tables)
This section contains error handling recommendations for files with varying numbers of errors.
If your data file almost never contains errors:
Direct Mode= false
ROW=1
ERROR ALLOWED = 0
If the data file has errors, the loading stops when the first error happens.
CONTINUE_LOAD=TRUE
SKIP=number of records already loaded
seq_nextval column.
seq_currval column which references the master table PK.
In this case, master records with errors will be rejected with all their detail records. You can recover these records by following these steps.
Record 9: Rejected - Error on table "MASTER_T", column "C3". ORA-01722: invalid number Record 10: Rejected - Error on table "DETAIL1_T". ORA-02291: integrity constraint (SCOTT.FK_SEQ) violated - parent key not found Record 11: Rejected - Error on table "DETAIL1_T". ORA-02291: integrity constraint (SCOTT.FK_SEQ) violated - parent key not found Record 21: Rejected - Error on table "DETAIL2_T". ORA-02291: invalid number
Load the records into independent tables. You can load the data in Direct Mode, with the following parameters that increase loading speed:
ROW>1
ERRORS ALLOWED=MAX
If you are using a master-detail flat file where the master record has a unique field (or if the concatenation of several fields can result in a unique identifier), you can use Direct Path Load as an option for faster loading.
For direct path loading, the record number (RECNUM) of each record is stored in the master and detail tables. A post-load procedure uses the RECNUM to update each detail row with the unique identifier of the corresponding master row.
This procedure outlines general steps for building such a mapping. Additional detailed instructions are available:
When naming the record types as you sample the file, assign descriptive names to the master and detail records, as shown in Figure 8-11. This will make it easier to identify those records in the future.
You can either select an existing repository table that you created earlier, or create a new unbound mapping table with no attributes and perform an outbound reconcile to define the table later.
The table must contain all the columns required for the master fields you plan to load plus an additional numeric column for loading the RECNUM value.
You can either select an existing repository table that you created earlier, or create a new unbound mapping table with no attributes and perform an outbound reconcile to define the table later.
The table must contain all the columns required for the detail fields you plan to load plus an additional numeric column for loading a RECNUM value, and a column that will be updated with the unique identifier of the corresponding master table row.
RECNUM attribute to the RECNUM columns in the master and detail tables, as shown in Figure 8-15.
If the master row unique identifier column is of a CHAR datatype, make the constant attribute a CHAR type with the expression `*'.
If the master row unique identifier column is a number, make the constant attribute a NUMBER with the expression `0'. Figure 8-14 shows the expression property of the constant attribute set to `0'. This constant marks all data rows as "just loaded."
Figure 8-15 shows a completed mapping with the flat file's master fields mapped to the master target table, the detail fields mapped to the detail target table, the RECNUM attributes from the Data Generator operator mapped to the master and detail target tables, respectively, and the constant attribute mapped to the detail target table.
Direct Mode: True
Errors Allowed: 0
Trailing Nullcols: True (for each table)
The following is an example of the generated SQL*Loader control file script:
OPTIONS ( DIRECT=TRUE,PARALLEL=FALSE, ERRORS=0, BINDSIZE=50000, ROWS=200, READSIZE=65536) LOAD DATA CHARACTERSET WE8MSWIN1252 INFILE 'g:\FFAS\DMR2.dat' READBUFFERS 4 INTO TABLE "MATER_TABLE" APPEND REENABLE DISABLED_CONSTRAINTS WHEN "REC_TYPE"='P' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS ( "REC_TYPE" POSITION (1) CHAR , "EMP_ID" CHAR , "ENAME" CHAR , "REC_NUM" RECNUM ) INTO TABLE "DETAIL_TABLE" APPEND REENABLE DISABLED_CONSTRAINTS WHEN "REC_TYPE"='E' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS ( "REC_TYPE" POSITION (1) CHAR , "C1" CHAR , "C2" CHAR , "C3" CHAR , "EMP_ID" CONSTANT '*', "REC_NUM" RECNUM
The following is an example of the post-update PL/SQL procedure:
create or replace procedure wb_md_post_update( master_table varchar2 ,master_recnum_column varchar2 ,master_unique_column varchar2 ,detail_table varchar2 ,detail_recnum_column varchar2 ,detail_masterunique_column varchar2 ,detail_just_load_condition varchar2) IS v_SqlStmt VARCHAR2(1000); BEGIN v_SqlStmt := 'UPDATE '||detail_table||' l '|| ' SET l.'||detail_masterunique_column||' = (select i.'||master_unique_column|| ' from '||master_table||' i '|| ' where i.'||master_recnum_column||' IN '|| ' (select max(ii.'||master_recnum_column||') '|| ' from '||master_table||' ii '|| ' where ii.'||master_recnum_column||' < l.'||detail_recnum_column||') '|| ' ) '|| ' WHERE l.'||detail_masterunique_column||' = '||''''||detail_just_load_condition||''''; dbms_output.put_line(v_sqlStmt); EXECUTE IMMEDIATE v_SqlStmt; END; /
After the initial loading of the master and detail tables, you can use the loaded sequence values to further transform, update, or merge master table data with detail table data. For example, if your master records have a column that acts as a unique identifier (such as an Employee ID), and you want to use it as the key to join master and detail rows (instead of the sequence field you added for that purpose), you can update the detail table(s) to use this unique column. You can then drop the sequence column you created for the purpose of the initial load. Operators such as the Aggregator, Filter, or Match and Merge operator can help you with these subsequent transformations.
You can introduce information external to Warehouse Builder as input into a mapping using a Mapping Input Parameter. For example, you can use an Input Parameter operator to pass SYSDATE to a mapping that loads data to a staging area. Use the same Input Parameter to pass the timestamp to another mapping that loads the data to a target.
When you a generate mapping, Warehouse Builder creates PL/SQL package. Mapping input parameters become part of the signature of the main procedure in the package.
The Mapping Input Parameter has a cardinality of one. It creates a single row set that can be combined with another row set as input to the next operator.
The names of the input attributes become the names of the mapping output parameters. The parameters can be used by connecting the attributes of the Mapping Input Parameters operator within the mapping editor. You can have only one Mapping Input Parameter in a mapping.
The Mapping Input Parameter contains the following properties:
The default value for the mapping input parameter appears in the DEFAULT clause following the function parameter declarations in the generated PL/SQL package. For example, if a mapping parameter named param1 with data type VARCHAR2 is defined with a default value of 'HELLO', the generated main function in the PL/SQL package appears as:
You can rename the attributes and define the data type and other attribute properties.
Use the Mapping Output Parameter operator to send values out of a mapping to applications external to Warehouse Builder. When you a generate mapping, Warehouse Builder creates a PL/SQL package. Mapping Output Parameters become part of the signature of the main procedure in the package.
The Mapping Output Parameter has only one input group. You can have only one Mapping Output Parameter in a map. Only attributes that are not associated with a row set can be mapped into a Mapping Output Parameter. For example, constant, input parameter, output from a pre-mapping process, or output from a post process can all contain attributes that are not associated with a row set. A Mapping Output Parameter is not valid for a SQL*Loader mapping.
The default value for the Mapping Output Parameter appears in the DEFAULT clause following the function parameter declarations in the generated PL/SQL package. For example, if a mapping parameter named param1 with data type VARCHAR2 is defined with a default value of 'HELLO', the generated main function in the PL/SQL package appears as:
If a Mapping Output Parameter named param1 has data type VARCHAR2, the generated main function in the PL/SQL package appears as:
The Mapping Output Parameter contains the following properties:
You can rename the attributes and define the data type and other attribute properties.
See Figure 8-17 for an example of an Mapping Output Parameter in a mapping.
A Mapping Sequence operator generates sequential numbers that increment for each row. For example, you can use the Sequence operator to create surrogate keys while loading data into a dimension table.
You can connect a Mapping Sequence to a target operator input or to the inputs of other types of operators. You can combine the sequence outputs with outputs from other operators.
This operator contains an output group containing the following output attributes:
You can bind and reconcile Mapping Sequences to a repository sequence in one of the modules. The repository sequence must be generated and deployed before the mapping containing the Mapping Sequence is deployed to avoid errors in the generated code package. See "Adding Bindable Operators" for more information.
Generate mappings with sequences using Row Based mode. Sequences are incremented even if rows are not selected. If you want a sequence to start from the last number, then do not run your SQL package in Set Based or in Set Based With Failover operating modes. See "Runtime Parameters Reference" for more information on configuring mode settings.
The Mapping Sequence contains the following property:
Warehouse Builder displays the Add Mapping Sequence dialog.
The Match-Merge operator is a data quality operator that you can use to first match and then merge data.
When you match records, you determine through business rules which records in a table refer to the same data. When you merge records, you consolidate into a single record the data from the matched records.
This section includes information and examples on how to use the Match-Merge operator in a mapping. The Match-Merge operator together with the Name-Address operator support householding, the process of identifying unique households in name and address data.
Consider how you could utilize the Match-Merge operator to manage a customer mailing list. Use matching to find records that refer to the same person in a table of customer data containing 10,000 rows. For example, you can define a match rule that screens records that have similar first and last names. Through matching you may discover that 5 rows refer to the same person. You can merge those records into one new record. For example, you can create a merge rule to retain the values from the one of the five matched records with the longest address. The newly merged table now contains one record for each customer.
Table 8-4 shows records that refer to the same person prior to using the Match-Merge operator.
Table 8-5 shows the single record for Jane Doe after using the Match-Merge operator. Notice that the new record retrieves data from different rows in the sample.
| FirstName | LastName | SSN | Address | Unit | Zip |
|---|---|---|---|---|---|
|
Jane |
Doe |
111111111 |
123 Main Street |
Apt 4 |
22222 |
Figure 8-18 shows a mapping you can design using a Match-Merge operator. Notice that the Match-Merge operator is preceded by a Name-Address operator, NAMEADDR, and a staging table, CLN_CUSTOMERS. You can design your mapping with or without a Name-Address operator. Preceding the Match-Merge operator with a Name-Address operator is desirable when you want to ensure your data is clean and standardized before launching time consuming match and merge operations.
Whether you include a Name-Address operator or not, be aware of the following considerations as you design your mapping:
You have the following options for using a match-merge operator:
Whether you are using the operator wizard or the Operator Editor, complete the following pages listed in a columnar table that reads down the columns left to right to conserve space:
Use the General page to specify a name and optional description for the operator. By default, the wizard names the match-merge operator "MM."
By definition, the Match-Merge operator has one input group and two output groups. You can rename the groups and add optional descriptions, but you cannot add or delete groups in the Match-Merge operator. The default names for the input group is INGRP1. The default names for the output groups are MERGE and XREF.
You assign attributes to the INGRP1 on the Input Connections page and then further edit those attributes on the Input Attributes page. You define attributes for the MERGE group on the Merge Output page. You can optionally assign attributes to the XREF group on the Cross Reference Output page.
Use the Input Connections page to select attributes to copy and map into the operator.
To complete the Input connections page for an operator:To search for a specific attribute or group by name, type the text in Search for and click Go. To find the next match, click Go again.
Hold the Shift key down to select multiple groups or attributes. If you want to select attributes from different groups, you must first combine the groups with a Joiner or Set operator.
You can use the right to left arrow to move groups or attributes from the input connections list. Warehouse Builder removes the selection from the input group and removes the mapping line between the source operator and the current operator.
Use the Input page to modify the attributes you selected in the Input Connections tab or wizard page.
You can perform the following tasks from the Match-Merge Input Attributes page:
Use the Merge Output tab to specify the attributes for the output MERGE group. The MERGE group produces a consolidated record with the attributes you select.
Use the Cross Reference Output page to optionally select attributes for the XREF group. Although the Match-Merge operator creates the XREF group by default, you have the option of adding attributes to the group or leaving it empty.
The XREF group is an optional group you can define to document the merge process. It allows you to create a foreign key relationship between the original data set to the new merged data set. You can send the attributes from the XREF group to a table that records the corresponding source row for each merged row.
Every row from INGRP1 corresponds to a row in the XREF group. To design the XREF group, select original attribute values and merged attributes from Source Attributes on the left. Warehouse Builder displays the merged attributes in Source Attributes with a default prefix of "MM_". Use Set Prefix at the bottom left corner of the page to change the prefix.
Use the Match Bins page to limit the set of possible matches to a manageable number. When Warehouse Builder matches the rows, it compares each row with the subsequent row for all rows within the same grouping. This can greatly enhance performance since Warehouse Builder searches for matches only within groupings and not throughout the entire data set.
While you want to define Match Bins that separate records into manageable groupings, you also want to avoid separating records that should be matched. The attribute(s) you select for grouping like data depends on your data. For example, if you have a table of customer address with a million rows, you may want to group the data by partial street name, city name, and zip code.
Ideally, you should attempt to keep the number of records in each grouping under 2000. The number of comparisons Warehouse Builder must perform is based on the following formula:
n=(b*(b-1))/2
where n is number of comparisons and b is number of records in a bin.
To match 5 records, Warehouse Builder must perform 10 comparisons. To match 50 records, Warehouse Builder must perform 1,225 comparisons. To match 500 records, Warehouse Builder must perform 124,750 comparisons.
You can define match rules for a single attribute or multiple attributes in the operator. On the Match Rules tab, create match rules at the top of the page. In the lower portion of Match Rules tab, specify the details for each match rule.
If you create more than one match rule, Warehouse Builder determines two rows match if those rows satisfy any of the match rules. In other words, Warehouse Builder evaluates multiple match rules using OR logic. This is indicated on the Match Rules tab by the OR icon in the left most column. For more information, see "Understanding Matching Concepts".
Warehouse Builder assigns a position number and creates a default name such as MM_MA_0 for each match rule you add. You can edit and drag the rule to a new position in the rule list. You can designate a match rule as either Active Match Rules or Passive Match Rules. Warehouse Builder does not execute a passive rule unless you call it through a custom match rule. Assign one of the rule types listed in Table 8-7. When you select a rule type, the lower portion of the Match Rules tab activates and you can enter details for the match rule.
When you add details for conditional rule types, Warehouse Builder prompts you to add details for one or more attributes. When you add details for multiple attributes, Warehouse Builder displays the AND icon in the left most column. This indicates that Warehouse Builder matches rows only when all the condition details are satisfied.
Warehouse Builder executes a match rule if you designate as active. If more than one match rule is active, each is evaluated in turn until a match is found or until all rules are evaluated. If a match occurs, the records are considered a match.
Warehouse Builder does not directly execute passive match rules. Instead, you can create custom match rules that are active and call passive match rules. For each match rule that you define, Warehouse Builder creates a corresponding function with the same name as the match rule. Custom rules can call the passive match rule using this function.
Use this editor to create a custom comparison algorithm for the match rule. Select Edit to launch the Custom Match Rule Editor. Double click or drag and drop desired functions and parameters from the navigation tree on the left to the Implementation editor on the right. You can write a custom match rule that references active and passive match rules, functions, and parameters such as THIS_ and THAT_ which represent the two records from INGRP1 that are being compared.
You can also validate your custom rule by selecting Test and Validation from the Custom Match Rule Editor
Use the Merge Rules tab to select values for the attributes in the merged record.
On the Merge Rules tab, create merge rules at the top of the page. In the lower portion of the Merge Rules tab, specify the details for each merge rule.
Warehouse Builder assigns a position number and creates a default name such as MM_ME_0 for each merge rule you add. Warehouse Builder executes merge rules in the order of their position numbers. You can edit and drag the rule to a new position in the rule list. Assign one of the rule types listed in Table 8-6.
| Merge Rule Type | Select Single or Multiple Attributes | Description |
|---|---|---|
|
Any |
Single |
Select an attribute and Warehouse Builder assigns the first non-blank value for that attribute. |
|
Any Record |
Multiple |
In the lower portion of the tab, select multiple attributes and Warehouse Builder assigns those attributes based on any matched row. |
|
Copy |
Single |
Select an attribute and Warehouse Builder assigns the value of another merged attribute. |
|
Custom |
Single |
Select an attribute. Warehouse Builder assigns a value based on a PL/SQL code that you write. See Custom Merge Rule for details. |
|
Custom Record |
Multiple |
In the lower portion of the tab, select multiple attributes. Warehouse Builder assigns a value based on a PL/SQL code that you write. See Custom Merge Rule for details. |
|
Match Id |
Single |
Use this rule when you map the XREF group from one Match-Merge operator into the input of another Match-Merge operator. Select a sequence from the lower portion on the tab to generate the Match Id. |
|
Min Max Record |
Multiple |
In the lower portion of the tab, select multiple attributes. Warehouse Builder assigns the merge attribute value from the match record that contains the relative value of the selecting attribute. |
|
Min Max |
Single |
Select the first value out of the possible matches based on the order of another attribute. |
|
Rank |
Single |
Warehouse Builder assigns values to records based on rank expressions you define. Select a record to use to populate a group of attributes. When you select the record, you have access to all attributes of all records in the matched record set. |
|
Rank Record |
Multiple |
In the lower portion of the tab, select multiple attributes. Warehouse Builder assigns values based on rank expressions you define. |
|
Sequence |
Single |
In the lower portion of the tab, select the sequence from the sequences defined in the project. |
When you select a rule type, the lower portion of the Merge Rules tab activates and you can enter details for the merge rule.
When you define a merge rule, you can define one rule for all the attributes in the merged record or define a rule for each attribute. For instance, if the merged record is a customer record, it may have attributes such as ADDRESS1, ADDRESS2, CITY, STATE, and ZIP. You can write five rules, one for each attribute. Or, you can write one rule such that all five attributes come from the same matched row.
When you write a rule for a single attribute, specify the attribute in the Attribute list box at the top of the page. Then, if necessary, specify the details for the rule in the lower portion of the tab.
When you write a rule for multiple attributes, Warehouse Builder disables the Attribute list box at the top of the page and you must define the details in the lower portion of the tab.
Use this editor to create a custom merge rule that assigns a value based on a PL/SQL code that you write and returns the attribute type you previously selected.
Select Edit to launch the Custom Merge Rule Editor. Double click or drag and drop desired functions and attributes from the navigation tree on the left to the Implementation editor on the right. You can write a custom merge rule that references source attributes, other merge attributes, and functions.
You can also validate your custom rule by selecting Test and Validation from the Custom Merge Rule Editor.
Following is the Example of a Custom Merge Rule for an Attribute:
BEGIN RETURN M_MATCHES(1)."TAXID"; END;
Following is the Example of a Custom Merge Rule for a Record:
BEGIN RETURN M_MATCHES(1); END;
When you use Warehouse Builder to match records, you can define a single match rule or multiple match rules. If you create more than one match rule, Warehouse Builder determines two rows match if those rows satisfy any of the match rules. In other words, Warehouse Builder evaluates multiple match rules using OR logic. Table 8-7 lists the match rules you can specify.
| Match Rule | Description |
|---|---|
|
Address |
Matches records based on postal addresses. Indicate the attribute(s) that compose the address. Assign input roles to each attribute. For details, see "Address Match Rule" |
|
All Match |
Matches all the rows within the Match Bin. |
|
Conditional |
Matches rows based on an algorithm you select. Warehouse Builder matches rows only when all the condition details are satisfied. For details, see "Conditional Match Rule" |
|
Custom |
Create a custom comparison algorithm. Select Edit to launch the Custom Match Rule Editor. For more information, see "Custom Match Rule". |
|
Firm |
Matches records based on business names. Indicate the attribute(s) that compose the firm name. Assign input roles to each attribute. For details, see "Firm Match Rule" |
|
None Match |
Specifies that no rows match within the Match Bin. |
|
Person |
Matches records based on peoples names. Indicate the attribute(s) that compose the person name. Assign input roles to each attribute. For details, see "Person Match Rule" |
|
Weight |
Matches rows based on scores that you assign to attributes. Warehouse Builder compares each attribute using a similarity algorithm that returns a score between 0 and 100 to represent the similarity between the rows. For two rows to be considered a match, the total counts must be greater than the overall score you designate. For details, see "Weight Match Rule" |
The following example illustrates how Warehouse Builder evaluates multiple match rules using OR logic.
In the top portion of the Match Rules tab, create two match rules as described in the following table:
| Name | Position | Rule Type | Usage | Description |
|---|---|---|---|---|
|
Rule_1 |
1 |
Conditional |
Active |
Match SSN |
|
Rule _2 |
2 |
Conditional |
Active |
Match LastName and PHN |
In the lower portion of the tab, assign the details to Rule_1 as described in the following table:
| Attribute | Position | Algorithm | Similarity Score | Blank Matching |
|---|---|---|---|---|
|
SSN |
1 |
Exact |
0 |
Do not match if either is blank |
For Rule_2, assign the details as described in the following table:
| Attribute | Position | Algorithm | Similarity Score | Blank Matching |
|---|---|---|---|---|
|
LastName |
1 |
Exact |
0 |
Do not match if either is blank |
|
PHN |
2 |
Exact |
0 |
Do not match if either is blank |
Assume you have the data listed in the following table:
| Row | FirstName | LastName | PHN | SSN |
|---|---|---|---|---|
|
A |
John |
Doe |
650-123-1111 |
NULL |
|
B |
Jonathan |
Doe |
650-123-1111 |
555-55-5555 |
|
C |
John |
Dough |
650-123-1111 |
555-55-5555 |
According to Rule_1, rowsB and C match. According to Rule_2, rows A and B match. Therefore, since Warehouse Builder handles match rules using OR logic, all three records match.
The general rule is, if A matches B, and B matches C, then A matches C. Assign a conditional match rule based on similarity such as described in Table 8-8:
| Attribute | Position | Algorithm | Similarity Score | Blank Matching |
|---|---|---|---|---|
|
LastName |
1 |
Similarity |
80 |
Do not match if either is blank |
Assume you have the data listed in the following table:
| Row | FirstName | LastName | PHN | SSN |
|---|---|---|---|---|
|
A |
John |
Jones |
650-123-1111 |
NULL |
|
B |
Jonathan |
James |
650-123-1111 |
555-55-5555 |
|
C |
John |
Jamos |
650-123-1111 |
555-55-5555 |
Jones matches James with a similarity of 80 and James matches Jamos with a similarity of 80. Jones does not match Jomos because the similarity is 60 which is less than 80. However, because Jones matches James, and James matches Jamos, all three records match.
Use the Address match rule to match records based on postal addresses. Matching by address is most effective when you first correct the address data using the Name-Address operator before the Match-Merge operator. The Name-Address operator identifies addresses as existing in a postal matching database and designates the records with the Is Found flag. The Match-Merge operator processes addresses with the Is Found role faster because the data is known to be syntactically correct, legal, and existing.
To define an Address match rule, complete the following steps:The Address Attributes tab and Details tab display at the bottom of the page.
You must perform this step. If you do not assign the Primary Address role, the match rule is ineffective and you can not access the Details tab.
Table 8-10 describes the Address Roles you can assign for the Address match rule.
Table 8-11 describes the options you can assign to Address Roles.
Use the Conditional Match Rule to combine multiple attribute comparisons into one composite rule. When you assign multiple attributes for comparison, all the comparisons must be true for the records to be considered a match.
To define a Conditional match rule, complete the following steps:The operator displays a Details section at the bottom of the tab.
Table 8-12 Algorithms for Match Rules
Use the Firm match rule to match records by business name. Matching by business name is most effective when you first correct the address data using the Name-Address operator before the Match-Merge operator.
To define a Firm match rule, complete the following steps:The Firm Attributes tab and Details tab display at the bottom of the page.
By default, the operator compares the values in Firm 1 for exact matches. You can change this default behavior by making selections on the Details tab.
For the match rule to be valid, you must assign at least one attribute either as Firm 1 or Firm 2.
If you select Strip noise words, the operator ignores words in the business names such as "the" and "and". If you select Cross match firm 1 and firm 2, the operator compares business names in firm 1 with business names in firm 2.
See Table 8-12for descriptions of the remaining options on the Details tab.
Use the Person match rule to match records based on names. Matching by names is most effective when you first correct the address data using the Name-Address operator before the Match-Merge operator.
To define a Person match rule, complete the following steps:The Person Attributes tab and Details tab display at the bottom of the page.
Table 8-13 describes the Name Roles you can assign for the Name match rule.
Table 8-14 lists the options you can select from the Details tab in the Name match rule.
| Option | Description |
|---|---|
|
Match on initials |
The operator matches initials to names such as "R." and "Robert". You can select this option for first name and middle name roles. |
|
Match on substrings |
The operator matches substrings to names such as "Rob" to "Robert". You can select this option for first name and middle name roles. |
|
Similarity |
See Table 8-12. |
|
Soundex |
See Table 8-12. |
|
Detect compound name |
The operator matches compound names to names such as "De Anne" to "Deanne". You can select this option for the first name role. |
|
"Mrs" Match |
The operator matches prenames to first and last names such as `Mrs. Washington" to "George Washington". You can select this option for the prename role. |
|
Match hyphenated names |
The operator matches hyphenated last names to unhyphenated last names such as "Reese-Jones" to "Reese". You can select this option for the last name role. |
|
Detect missing hyphen |
The operator detects missing hyphens such as matching "Hillary Rodham Clinton" to "Hillary Rodham-Clinton". You can select this option for the last name role. |
|
Detect switched name order |
The operator detects switched name orders such as matching "Elmer Fudd" to "Fudd Elmer". You can select this option if you selected First Name and Last Name roles for attributes on the Person Attributes tab. |
Use this rule to match rows based on a weight value that you assign. A weighted match rule is most useful when comparing a large number of attributes, without having a single attribute that is different causing a non-match, as can happen with conditional rules.
To use the Weight match rule, complete the following steps:The Details tab display at the bottom of the page.
In Maximum Score, assign a weight to each attribute you want to include in the comparison. Warehouse Builder compares each attribute using a similarity algorithm that returns a score between 0 and 100 to represent the similarity between the rows. A value of 100 indicates that the two values are identical. A value of zero indicates there is no similarity.
For two rows to be considered a match, the total counts must be greater than the Required score you designate.
Assume you want to apply the Weight match rule to the data in Table 8-15:
| Record Number | Attr_1 | Attr_2 |
|---|---|---|
|
Rec_1 |
CA |
|
|
Rec_2 |
CA |
|
|
Rec_3 |
CA |
QR |
For Maximum score, you assign a value of 50 to both Att_1 and Att_2. You assign a value of 80 for the Required score to match. You can expect the following results:
Once you pass data through a Match-Merge operator, you may want to further refine the data. For example, when householding name and address data, you may need to merge the data first for addresses and then again for names. Assuming you map the MERGE output to a target table, you can map the XREF group to either a staging table or to another Match-Merge operator. Although mapping to a staging table is relatively easy to design, it can lead to significant loss of performance. Map the XREF group directly to another Match-Merge operator to avoid loss of performance.
Figure 8-19 shows a mapping that relies on a staging table between two Match-Merge operators MM and MM_1.
Figure 8-20 shows a mapping that achieves the same results with better performance. The XREF group from MM is mapped directly to MM_1. For this mapping to be valid, you must assign the Match Id generated for the first XREF group as the Match Bin rule on the second Match-Merge operator.
This section includes the following topics:
This section contains introductory material followed by instructions for using the operator. For advanced information on the Name and Address operator, see Chapter 20, "Data Quality: Name and Address Cleansing".
Oracle9i Warehouse Builder enables you to perform name and address cleansing on your data with the Name and Address operator. The Name and Address operator identifies and corrects errors and inconsistencies in name and address source data by comparing input data to the data libraries supplied by the third-party name and address cleansing software vendors. You can purchase the data libraries directly from these vendors.
The errors and inconsistencies corrected by the Name and Address operator include variations in address formats, use of abbreviations, misspellings, outdated information, inconsistent data, or transposed names. The operator fixes these errors and inconsistencies by:
Augmenting addresses with geographic information facilitates geography-specific marketing initiatives, such as marketing only to customers in large metropolitan areas (for example, within an n-mile radius from large cities); marketing only to customers served by a company's stores (within x mile radius from these stores). Oracle Spatial, an option with Oracle9i, and Oracle Locator, packaged with Oracle9i, are two products that you can use in conjunction with this feature.
Finally, the Name and Address operator enables you to generate postal reports for countries that support address correction and postal matching. Using postal reports in countries that support this feature often qualifies you for mailing discounts. For more information, see "Postal Reporting".
This example follows a record through a mapping using the Name and Address operator. This mapping also uses a Splitter operator to demonstrate a highly recommended data quality error handling technique.
In this example, your source data contains a Customer table with the row of data shown in Table 8-16.
| Address Column | Address Component |
|---|---|
|
Name |
Joe Smith |
|
Street Address |
8500 Normandale Lake Suite 710 |
|
City |
Bloomington |
|
ZIP Code |
55437 |
This data contains a nickname, a last name, and part of a mailing address, but it lacks the customer's full name, complete street address, and the state in which he lives. The data also lacks geographic information such as latitude and longitude, which can be used for distance calculations for truckload shipping. In order to complete the name and address data, you can use the Name and Address operator.
This example uses a mapping with a Name and Address operator followed by a Splitter operator to cleanse name and address records and then load them into separate targets depending on whether they were successfully parsed. This section explains the general steps required to design such a mapping. For detailed information on each type of operator, refer to that operator's description in this chapter.
To make the listed changes to the sample record:A CUSTOMERS table from which you extract the records. This is your data source. It contains the data in "Example Input".
A Name and Address operator. This action launches the Name and Address Wizard. Complete it following the instructions in "Using the Name and Address Operator in a Mapping".
A Splitter operator. For information on using this operator, see "Splitter Operator".
Three target operators to which you load (respectively):
The successfully parsed records
The records with parsing errors
The records whose addresses are parsed but not found in the postal matching software
You are not required to use the Splitter operator in conjunction with the Name and Address operator, but the combination highlights the Name and Address error handling capabilities.
Figure 8-21 shows a mapping designed for this example. The data is mapped from the source table to the Name and Address operator, and then to the Splitter operator. The Splitter operator separates the successfully parsed records from those that have errors. The output from OUTGRP1 is mapped to the CUSTOMERS_GOOD target. The Split Condition for OUTGRP2 is shown at the bottom of the screen: records whose Is Parsed flag is False are loaded to the NOT_PARSED target. Records in the REMAINING_RECORDS group are successfully parsed, but their addresses are not found by the postal matching software. These records are loaded to the PARSED_NOT_FOUND target.
If you run the mapping designed in this example, the Name and Address operator standardizes, corrects, and completes the address data from the source table. In this example, the target table contains the address data as shown in Table 8-17 (compare it with the input record from Table 8-16).
In this example, the following changes were made to the input data:
You have the following options for using the Name and Address operator:
Whether you are using the operator wizard or the Operator Editor, complete the following pages listed in a columnar table that reads down the columns left to right to conserve space:
|
N/A |
Use the General page to specify a name and optional description for the operator. By default, the wizard names the Name and Address operator "NAMEADDR."
Characterize the nature of your input data by assigning general definitions to this Name and Address operator.
Figure 8-22 shows the Definitions page containing sample values for the data described in ""Example: Following a Record Through the Name and Address Operator". Here, the Parsing Type is set to `Name and Address', Primary Country is set to `United States', and Dual Address Assignment is set to `P.O. Box'.
Select one of the following parsing types from the drop-down list:
Select the country which best represents the country distribution of your data. The primary country is used by some providers of name and address cleansing software as a hint for the appropriate parser or parsing rules to use on the initial parse of the record. For other name and address service providers, external configuration of their installation controls this behavior.
A dual address contains both a Post Office (PO) box and a street address for the same address record. For records that have dual addresses, your selection determines which address becomes the normal address and which address becomes the dual address. A sample dual address is:
PO Box 2589
4439 Mormon Coulee Rd
La Crosse WI 54601-8231
Note that your choice for Dual Address Assignment affects which postal codes are assigned during postal code correction, because the street address and PO box address may correspond to different postal codes.
This option has no effect for records having a single street or PO box address. Note that this option may not be supported by all name and address cleansing software providers.
By definition, the Name and Address operator has one input group and one output group. You cannot edit, add, or delete groups in the Name and Address operator. The input group is called INGRP1 and the output group is OUTGRP1. You can edit these names. If your input data requires multiple groups, create a separate Name and Address operator for each group.
You assign attributes to the INGROUP on the Input Connections page and then edit those attributes on the Input Attributes page. You assign attributes to the OUTGRP1 group on the Output Attributes page.
Use the Input Connections page to select attributes from any operator in your mapping that you want to copy and map into the operator. The Available Attributes panel enables you to select attributes from any operator in your mapping. The Mapped Attributes panel represents the Name and Address operator. When you move attributes from the left panel to the right, you map them to the operator.
Figure 8-23 shows the Input Connections page containing sample values for the example described in "Example: Following a Record Through the Name and Address Operator". Notice that the CUSTOMERS table columns are mapped as the input attributes.
To complete the Input Connections page for an operator:
To search for a specific attribute or group by name, type the text in Search for and click Go. To find the next match, click Go again.
Hold the Shift key down to select multiple groups or attributes. If you want to select attributes from different groups, you must first combine the groups with a Joiner or Set operator.
Use the Input Attributes page to further modify the attributes you selected in the Input Connections page, and to assign input roles to each input attribute.
You can perform the following tasks from the Name and Address Input Attributes page:
Input roles indicate what kind of name or address information resides in a line of data. For each attribute, select the Input Role that most closely matches the data contained in the source attribute. Refer to Table 20-1 for a complete list of input roles and their descriptions.
You can select either non-discrete (line oriented) input roles for free-form data, or discrete roles (such as first name, primary address, or city) for specific input attributes. Whenever possible, choose discrete input roles (such as `Person'), rather than non-discrete roles (such as `Line1'). Discrete roles give the Name and Address operator more information about the content of the source attribute.
Figure 8-24 shows the Input Attributes page containing sample values for the data described in "Example: Following a Record Through the Name and Address Operator". In this example, the source table has only one NAME column containing both the first and last names, such as Joe Smith, so this column is assigned the `Person' input role. In the example, the source table also contains the entire street address portion of the mailing address in the STREETADDRESS column, so that column is assigned the `Address' input role.
Use the Output Attributes page to define output attributes that determine how the Name and Address operator handles parsed data. Specifically, the output attribute properties characterize the data that is extracted from the parser output.
The output attribute collection is initially empty. You can create and edit attributes.
Figure 8-25 shows the Output Attributes page containing sample values for the data described in "Example: Following a Record Through the Name and Address Operator".
Notice that every output attribute is assigned an output component. For example, the FirstName attribute is assigned the `First Name Standardized1' component. Notice also that `Latitude' and `Longitude' attributes are added to augment the address information.
Finally, notice that several error handling flags are added, such as Is Parsed, Is Good Name, and Is Good Address. These flags can be used with the Splitter operator to separate successfully parsed records from records with errors and load them into different targets.
Use the Output Attributes components dialog to define components for the output attributes you create. Each output component represents a discrete name or address entity, such as a title, a standardized first name, a street number, a street name, or a street type, and indicates which component of a name or address an attribute constitutes. The component you select is assigned to the output attribute.
For example, an input record containing John and Jane Doe would have two name occurrences: John Doe and Jane Doe. You can extract the first person with any name component by assigning Instance 1 to that component. Similarly, you can also extract the second person using any name component by assigning Instance 2. The number of instances allowed for various components depends on the vendor of the name and address cleansing software you use. Miscellaneous address may also have multiple instances, for example, if both an email address and phone number are present.
Figure 8-26 shows the Output Components page selecting the component for the first sample output attribute used in "Example: Following a Record Through the Name and Address Operator".
In "Example: Following a Record Through the Name and Address Operator", the address type is Normal. The example calls for the following output components: First Name Standardized1, Last Name1, Primary Address2, Secondary Address2, City, State, Postal Code, Latitude, Longitude, Is Parsed, Is Good Name, Name Warning, Is Good Address, Is Found, Street Warning, and City Warning.
Postal reporting applies only to countries that support address correction and postal matching. Country certification varies with different vendors of name and address cleansing software. The most common country certifications are United States, Canada, and Australia. The process provides mailers a common platform to measure the quality of address-matching software, focusing on the accuracy of postal codes (in the case of the United States, of five-digit ZIP Codes and ZIP+4 Codes), delivery point codes, and carrier route codes applied to all mail. Some vendors of name and address cleansing software may ignore these parameters and require external setup for generating postal reports. For more information, see "Postal Reporting".
Figure 8-27 shows the Postal Report page containing sample values.
To exit the Name and Address Wizard, click Finish. To exit the Name and Address Editor, click OK.
You can specify a postal report using the Postal Report page in the Name and Address operator editor or wizard. The postal report is generated when the mapping containing the Name and Address operator is executed.
Postal reporting applies to only those countries that support address correction and postal matching. Those countries vary among vendors of name and address cleansing software. The most common certifications are for United States, Canada, and Australia.
The process provides mailers a common platform to measure the quality of address-matching software, focusing on the accuracy of postal codes (in the case of the United States, of five-digit ZIP Codes and ZIP+4 Codes), delivery point codes, and carrier route codes applied to all mail.
All address lists used to produce mailings for automation rates must be matched by postal report-certified software. Oracle9i Warehouse Builder Name and Address is built on name and address software and data supplied by third-party software vendors specializing in name and address cleansing. Certifications therefore depend on the vendor, and may include the following:
To access the postal report, you must have access to the file system where the Name and Address Server resides. The reports are processed by the Name and Address Server and are written to the owb/bin/admin/reports folder, located in the Oracle home path specified during the Warehouse Builder Server-Side installation. For more information on installation parameters, see the Oracle9i Warehouse Builder Installation and Configuration Guide.
For each report, Warehouse Builder creates a unique file name using the Country Code, Group Name, and the date and time the file is created, for example: p_CAN_TESTGROUP1_20021219_0130.txt. This naming convention may not apply to all vendors of postal matching software, because in some cases, the file naming is controlled through external configuration of the vendor installation.
A postal report-enabled mapping should only process that country's data, specified as the Primary Country in the Definitions page of the Name and Address wizard. If your source contains international data, and if the source records contain country codes, map the Country Code source column to an input group attribute of the Name and Address operator. Assign the `Country Code' input role to the attribute.
The pivot operator enables you to transform a single row of attributes into multiple rows. Use this operator in a mapping when you want to transform data that is contained across attributes instead of rows. This situation can arise when you extract data from non-relational data sources such as data in a crosstab format.
The external table SALES_DAT, shown in Figure 8-28, contains data from a flat file. There is a row for each sales representative and separate columns for each month. For more information on external tables, see "Using External Tables".
Table 8-18 shows a sample of the data after Warehouse Builder performs a pivot operation. The data that was formerly contained across multiple columns (M1, M2, M3...) is now contained in a single attribute (Monthly_Sales). A single ID row in SALES_DAT corresponds to 12 rows in pivoted data.
To perform the pivot transformation in this example, create a mapping like the one shown in Figure 8-29.
In this mapping, Warehouse Builder reads the data from the external table once, pivots the data, aggregates the data, and writes it to a target in set based mode. It is not necessary to load the data to a target directly after pivoting it. You can use the pivot operator in a series of operators before and after directing data into the target operator. You can place operators such as filter, joiner, and set operation before the pivot operator. Since pivoted data in Warehouse Builder is not a row-by-row operation, you can also execute the mapping in set based mode.
In the pivot operator, the row locator is an output attribute that you create to correspond to the repeated set of data from the source. When you use the pivot operator, Warehouse Builder transforms a single input attribute into multiple rows and generates values for a row locator. In this example, since the source contains attributes for each month, you can create an output attribute named `MONTH' and designate it as the row locator. Each row from SALES_DAT then yields 12 rows of pivoted data in the output.
Table 8-19 shows the data from the first row from SALES_DAT after Warehouse Builder pivots the data with `MONTH' as the row indicator.
You have the following options for using a pivot operator:
Whether you are using the Pivot operator wizard or the Pivot Editor, complete the following pages listed in a columnar table that reads down the columns left to right to conserve space:
Use the General page to specify a name and optional description for the pivot operator. By default, the wizard names the operator "Pivot."
Use the Groups page to specify one input and one output group.
In a pivot operator, the input group represents the data from the source that is contained across multiple attributes. The output group represents that data transformed into rows.
You can rename and add descriptions to the input and output groups. Since each pivot operator must have exactly one input and one output group, the wizard prevents you from adding or removing groups or changing group direction.
Use the Input Connections page to copy and map attributes into the pivot operator. The attributes you select become mapped to the pivot input group. The left side of the page displays a list of all the operators in the mapping. Figure 8-30 shows a group from the external table SALES_DAT selected as input for the pivot operator.
To complete the Input Connections page for a pivot operator:
To search for a specific attribute or group by name, type the text in Search for and select Go. To find the next match, select Go again.
Press the Shift key to select multiple attributes. If you want to select attributes from different groups, you must first combine the groups with a Joiner or Set operator.
Use the right to left arrow to remove groups or attributes from the input connections list. Warehouse Builder removes the selection from the input group and removes the mapping line between the source operator and the pivot operator. Figure 8-31 shows a group from SALES_DAT copied and mapped into the PIVOTSALES operator.
Use the Input Attributes page to modify the attributes you selected in the Input connections tab or wizard page.
You can perform the following tasks from the pivot Input Attributes page:
Use the Output Attributes page to create the output attributes for the pivot operator. If you designated any input attributes as keys on the Input Attributes tab or wizard page, Warehouse Builder displays those input attributes as output attributes that you cannot edit or delete. Figure 8-32 displays the output attributes with MONTH selected as the row locator.
You can perform the following tasks from the pivot Output Attributes Page:
In the pivot operator, the row locator is an output attribute that corresponds to the repeated set of data from the source. For example, if the source data contains separate attributes for each month, create an output attribute `MONTH' and designate it as the row locator.
Use the Pivot Transform page to write expressions for each output attribute.
By default, Warehouse Builder displays two rows. Use Add to specify how many rows of output you want from a single row in the source. For example, if your source contains an attribute for each quarter in a year, you can specify 4 rows of output for each row in the source. If the source data contains an attribute for each month in the year, you can specify 12 rows of output for each row in the source.
Figure 8-33 shows the Pivot Transform tab with the pivot expressions defined for a source with an attribute for each month.
Write pivot expressions based on the following types of output:
When using the wizard to create a new pivot operator, click Finish when you want to close the wizard. The Mapping Editor displays the operator you defined.
When using the Pivot Editor to edit an existing pivot operator, click OK when you have finished editing the operator. The Mapping Editor updates the operator with the changes you made.
Use a Post-Mapping Process operator to define a procedure to be executed after running a mapping. For example, you can use a Post-Mapping Process operator to reenable and build indexes after a mapping completes successfully and loads data into the target.
The Post-Mapping Process operator calls a function or procedure that is defined in Warehouse Builder after the mapping is executed. The output parameter group provides the connection point for the returned value (if implemented via a function) and the output parameters of the function or procedure. There are no restrictions on the connections of these output attributes
The Post-Mapping Process operator contains groups corresponding to the number and direction of the parameters associated with the selected PL/SQL procedure or function. This list of groups and attributes can only be modified through reconciliation.
You can only define one Post-Mapping Process operator for a mapping. If you want to run more than one procedure after a mapping, you must wrap the procedures into one procedure.
You can map constants, data generators, mapping input parameters, and output from a Pre-Mapping Process into a Post-Mapping Process operator. The Post-Mapping Process operator is not valid for an SQL*Loader mapping.
After you add a Post-Mapping Process operator to the Mapping Editor, use the operator properties dialog to specify run conditions in which to execute the process.
To use a Post-Mapping Process operator in a mapping:Warehouse Builder displays the Add Mapping Transformation dialog.
Always: The process runs regardless of errors from the mapping.
On Success: The process runs only if the mapping completes without errors.
On Error: The process runs only if the mapping completes with errors exceeding the number of allowed errors set for the mapping.
On Warning: The process runs only if the mapping completes with errors that are less than the number of allowed errors set for the mapping.
If you select On Error or On Warning and the mapping runs in row based mode, you must verify the Maximum Number of Errors set for the mapping. To view the number of allowed errors, right click on the mapping in the navigation tree, select Configure, and expand Runtime Parameters.
Use a Pre-Mapping Process operator to define a procedure to be executed before running a mapping. For example, you can use a Pre-Mapping Process operator to truncate tables in a staging area before running a mapping that loads tables to that staging area. You can also use a Pre-Mapping Process operator to disable indexes before running a mapping that loads data to a target. You can then use a Post-Mapping Process operator to reenable and build the indexes after running the mapping that loads data to the target.
The Pre-Mapping Process operator calls a function or procedure whose metadata is defined in Warehouse Builder prior to executing a mapping. The output parameter group provides the connection point for the returned value (if implemented with a function) and the output parameters of the function or procedure. There are no restrictions on the connections of these output attributes.
When you drop a Pre-Mapping Process operator onto the Mapping Editor canvas, a dialog opens displaying the available libraries, categories, functions, and procedures. After you select a function or procedure from the tree, the operator displays with predefined input and output parameters.
The Pre-Mapping Process operator contains groups corresponding to the number and direction of the parameters associated with the selected PL/SQL procedure or function.
A mapping can only contain one Pre-Mapping Process operator. Only constants, mapping input parameters, and output from a Pre-Mapping Process can be mapped into a Post-Mapping Process operator.
After you add a Pre-Mapping Process operator to the Mapping Editor, use the operator property dialog to specify conditions in which to execute the mapping.
To use a Pre-Mapping Process operator in a mapping:The Add Mapping Transformation dialog displays.
Always: Warehouse Builder runs the mapping after the process completes, regardless of the errors.
On Success: Warehouse Builder runs the mapping only if the process completes without errors.
On Error: Warehouse Builder runs the mapping only if the process completes with errors.
The Set Operation operator enables you to use following set operations in a mapping:
By default, the Set Operation operator contains two input groups and one output group. You can add input groups by using the operator editor. Mapping attributes to a Set Operation input group creates corresponding attributes with the same name and data type in the Set Operation output group. The number of attributes in the output group matches the number of attributes in the input group containing the most number of attributes.
To use the Set Operation operator:
Corresponding attributes are determined by the order of the attributes within an input group. For example, attribute 1 in input group 1 corresponds to attribute 1 in input group 2.
You must apply the set operation in top-down order. The order of the input groups determines the execution order of the set operation. This order only affects the minus operation. For example, A minus B is not the same as B minus A. The order of the attributes within the first input group determines the structure of a set. For example, {empno, ename} is not the same as {ename, empno}.
To use the Set Operation operator in a mapping:
The Set Operation properties window displays.
You can produce a sorted row set using the Sorter operator. The Sorter operator enables you to specify which input attributes are sorted and whether the sorting is performed in ascending or descending order. Warehouse Builder sorts data by placing an ORDER BY clause in the code generated by the mapping.
The Sorter operator has one input/output group. You can use the Sorter operator to sort data from any relational database source. You can place any operator after the Sorter operator.
The Sorter operator contains the following property:
The Sorter properties window displays.
The Order By Clause dialog displays.
Select an attribute from the Available Attributes list and click the right arrow button. Or, click the double right arrow button to select all of the Available Attributes.
Select the attribute in the ORDER BY Attributes list and select ASC (ascending) or DESC (descending) from the drop-down list.
You can use the Splitter operator to split data from one source to several targets. The operator splits a single input row set into several output row sets using a boolean split condition. Each output row set has a cardinality less than or equal to the input cardinality.
You can configure Warehouse Builder mappings that split data from one source to multiple targets to take advantage of Oracle9i functionality that optimizes SQL code and improves performance. For more information, see "".
The Splitter operator creates an output group called REMAINING_ROWS containing all input rows not included in any of the other output groups. You can delete this output group, but you cannot edit it.
The Splitter Operator contains the following properties:
To use the Splitter operator in a mapping:
The output attributes are created with data types matching the corresponding input data types.
The Splitter properties window displays as shown in Figure 8-35.
When you design and configure a mapping with the Splitter operator, Warehouse Builder generates a multi_table_insert statement. This SQL statement takes advantage of parallel query and parallel DML services in the Oracle9i database server.
From Warehouse Builder, right-click the target module on the navigation tree and select Configuration Properties. Under Target Database Type, select Oracle9i.
The targets must be tables, not views or materialized views. Each target table must have less than 999 columns. Between the Splitter operator and the targets, do not include any operators that change the cardinality. For example, you can place a Filter between the Splitter and the targets as shown in Figure 8-38, but not a Joiner or Aggregator operator.
Warehouse Builder displays the configuration properties dialog for a mapping as shown in Figure 8-39.
When you run this mapping and view the generation results, Warehouse Builder returns one total SELECT and INSERT count for all targets.
Table function operators enable you to manipulate a set of input rows and return another set of rows possibly of different cardinality. Unlike conventional functions, table functions can return a set of output rows that can be queried like a physical table. Using table functions can greatly improve performance when loading your data warehouse.
Table Functions have the following characteristics:
Before you can use the Mapping Table Function operator in a mapping, you need to create the table function in your target, external to Warehouse Builder. The table functions in the database that are supported by the unbound table function operator must meet the following requirements:
TABLE OF:
For an unbound Mapping Table Function operator in a mapping:
The Mapping Table Function operator contains the following properties:
General properties include the name and description. Specify the name of the table function located in the target database. The description is optional.
The table function operator accepts the following types of input parameters:
To use a Table Function operator in a mapping:
This automatically creates the input attributes.
Before you deploy the mapping containing the mapping table function operator, you must manually create the table function in the target warehouse. The mapping table function operator is bound to the actual table function object through the code generated by the mapping.
You use the Mapping Transformation operator to transform the column value data of rows within a row set using a PL/SQL function, while preserving the cardinality of the input row set.
The Mapping Transformation operator must be bound to a function or procedure contained by one of the modules in the repository. The inputs and outputs of the Mapping Transformation operator correspond to the input and output parameters of the bound repository function or procedure. If the Mapping Transformation operator is bound to a function, a result output is added to the operator that corresponds to the result of the function. The bound function or procedure must be generated and deployed before the mapping can be deployed, unless the function or procedure already exists in the target system.
Warehouse Builder provides pre-defined PL/SQL library functions in the runtime schema that can be selected as a bound function when adding a Mapping Transformation operator onto a mapping. In addition, you can choose a function or procedure from the Global Shared Library.
The Mapping Transformation operator contains the following properties:
To use a Mapping Transformation operator in a mapping:
The Add Mapping Transformation dialog displays.
The Mapping Transformation properties window displays.
The unpivot operator converts multiple input rows into one output row. The unpivot operator allows you to extract from a source once and produce one row from a set of source rows that are grouped by attributes in the source data. Like the pivot operator, the unpivot operator can be placed anywhere in a mapping.
Table 8-20 shows a sample of data from the SALES relational table. In the crosstab format, the `MONTH' column has 12 possible character values, one for each month of the year. All sales figures are contained in one column, `MONTHLY_SALES'.
Figure 8-40 depicts data from the relational table `SALES' after Warehouse Builder unpivoted the table. The data formerly contained in the `MONTH' column (Jan, Feb, Mar...) corresponds to12 separate attributes (M1, M2, M3...). The sales figures formerly contained in the `MONTHLY_SALES' are now distributed across the 12 attributes for each month.
When you use the unpivot operator, Warehouse Builder transforms multiple input rows into a single row based on the row locator. In the unpivot operator, the row locator is an attribute that you must select from the source to correspond with a set of output attributes that you define. A row locator is required in an unpivot operator. In this example, the row locator is `MONTH' from the `SALES' table and it corresponds to attributes M1, M2, M3... M12 in the unpivoted output.
You have the following options for using an unpivot operator:
Whether you are using the Unpivot operator wizard or the Unpivot Editor, complete the following pages listed in a columnar table that reads down the columns left to right to conserve space:
|
N/A |
Use the General page to specify a name and optional description for the unpivot operator. By default, the wizard names the operator "Unpivot."
Use the Groups page to specify one input and one output group.
In an unpivot operator, the input group represents the source data in crosstab format. The output group represents the target data distributed across multiple attributes.
You can rename and add descriptions to the input and output groups. Since each unpivot operator must have exactly one input and one output group, the wizard prevents you from adding or removing groups or changing group direction.
Use the Input Connections page to select attributes to copy and map into the unpivot operator.
To complete the Input connections page for an unpivot operator:To search for a specific attribute or group by name, type the text in Search for and click Go. To find the next match, click Go again.
Hold the Shift key down to select multiple groups or attributes. If you want to select attributes from different groups, you must first combine the groups with a Joiner or Set operator.
You can use the right to left arrow to move groups or attributes from the input connections list. Warehouse Builder removes the selection from the input group and removes the mapping line between the source operator and the pivot operator.
Use the Input Attributes page to modify the attributes you selected in the Input connections tab or wizard page.
You can perform the following tasks from the Unpivot Input Attributes Page:
Use the Row locator page to select a row locator and assign values to the distinct values contained in the row locator. Figure 8-41 shows the attribute MONTH selected as the row locator with values of `Jan', `Feb', `Mar', etc.
To complete the Unpivot Row Locator page:
In the unpivot operator, the row locator is the attribute from the source data that corresponds to a set of output attributes.
For string values, enclose the text in single quotes. For example, if the row locator is `MONTH', there would be a total of 12 distinct values for that attribute. Click Add to add a row for each distinct value. For row locator values, type values exactly as they appear in the source dataset. For instance, the row locator values as shown in Table 8-20 are `Jan', `Feb', and `Mar.'
Use the Output Attributes page shown in Figure 8-42 to create the output attributes for the unpivot operator.
If you designated any input attributes as keys on the Input Attributes tab or wizard page, Warehouse Builder displays those input attributes as output attributes that you cannot edit or remove.
You can perform the following tasks from the Pivot Output Attributes page:
Use the Unpivot Transform page shown in Figure 8-43 to write expressions for each output attribute.
For attributes you designated as keys, Warehouse Builder defines the matching row and expression for you. Warehouse Builder displays the first row as the match for a key attribute. For all other output attributes, specify the matching row and the expression.
|
|
![]() Copyright © 2001, 2003 Oracle Corporation. All Rights Reserved. |
|