| Oracle9i Warehouse Builder User's Guide Release 9.2 Part Number B10996-01 |
|
Earlier in the design phase, you defined a logical model for your target system using Warehouse Builder design objects. This chapter includes reference information and recommendations for assigning physical properties to mappings and process flows.
This chapter includes:
When you correctly configure mappings, you can improve the ETL performance. Use this section as a reference for setting configuration parameters that govern how Warehouse Builder loads data and optimizes code for better performance.
This section includes the following topics:
Or, right-click the mapping you want to configure and select Configure from the pop-up menu.
Warehouse Builder displays the configuration Properties dialog for a mapping as shown in Figure 11-1.
The default setting is true. If you set deployable to false for a mapping, Warehouse Builder does not generate scripts for that mapping.
The options you can choose from depend upon the design and use of the operators in the mapping. Warehouse Builder sets the correct value depending on the mapping: PL/SQL, SQL*Loader, ABAP (for an SAP source mapping).
For a description of each runtime parameter, see Runtime Parameters Reference.
For a description of each option, see "Code Generation Options Reference".
The Configuration Properties dialog displays all source and target operators in a mapping under Sources and Targets. Each operator contains a set of properties you can edit. For information on configuring sources and targets in a mapping, see "Sources and Targets Reference".
When you configure Runtime Parameters for a mapping, you set the default behaviors for the mapping. You can override these parameters when you execute the mapping either in the Deployment Manager, the Process Flow Editor, or Oracle Enterprise Manager.
The Runtime Parameters include the following parameters:
For mappings with a PL/SQL implementation, select a default operating mode. The operating mode you select can greatly affect mapping performance. For details on how operating modes affect performance, see "Selecting a Default Operating Mode". You can select one of the following operating modes:
Use Bulk Size to specify the number of rows in each bulk for PL/SQL Bulk Processing. Warehouse Builder uses the Bulk Size parameter only when Generate Bulk Processing Mode is set to true. For more information, see the Oracle PL/SQL Reference Guide.
Use Default Audit Level to indicate the audit level used when executing the package. Audit levels dictate the amount of audit information captured in the runtime schema when the package is run. The audit level settings are:
Use Maximum Number of Errors to indicate the maximum number of errors allowed when executing the package. Execution of the package terminates when the number of errors reached is greater than the maximum number of errors value.
Commit frequency only applies to non-bulk mode mappings. Bulk mode mappings commit according to the bulk size.
When you set the Default Operating Mode to row based and Bulk Processing Code to false, Warehouse Builder uses the Commit Frequency parameter when executing the package. Warehouse Builder commits data to the database after processing the number of rows specified in this parameter.
If you set Bulk Processing Code to true, set the Commit Frequency equal to the Bulk Size. If the two values are different, Bulk Size overrides the commit frequency and Warehouse Builder implicitly performs a commit for every bulk size.
The Default Purge Group is used when executing the package. Each audit record in the runtime schema is assigned to the purge group specified.
When you set Analyze Table Statements to true, Warehouse Builder estimates when gathering statistics on the target tables. After data is loaded into the target tables, statistics used for cost-based optimization are gathered on each target table. You can set this parameter to the percentage of rows in each target table used for this analysis.
The Code Generation Options include the following:
If this configuration parameter is set to true, Warehouse Builder generates PL/SQL bulk processing code. PL/SQL bulk processing improves row-based ETL performance by collecting, processing, and writing rows in bulk, instead of doing it row by row. The size of each bulk is determined by the configuration parameter Bulk Size. Set based mode offers optimal performance, followed by bulk processing, and finally by row based mode. For more information, see the Oracle PL/SQL Reference Guide.
Row based mode is an operating mode option. Row based mode allows for the maximum amount of runtime auditing. A mapping that runs in row-based mode processes data from a source record by record.
Set this parameter to true to generate code using the table function feature supported in Oracle9i. This setting improves performance by enabling table functions to use parallel execution in the Oracle Server for Row based and Row based (Target only) operating modes.
You can set this parameter to true when the mapping generates PL/SQL code and is executed by Oracle9i server.
If the mapping generates SQL* Loader code, set the parameter to false (default). This property is only available for mappings that will generate PL/SQL code. It will not be available as a configuration property for mappings that will generate SQL* Loader code. This property is not available for set-based mode, but for set-based-fail-over-rowbased and set-based-fail-over-rowbased-target modes, this property is available if the set based mode fails and the code executes in row-based/row-based-target mode. This property is not available for mappings that include a Mapping Output Parameter, a Flat File operator, or a Sequence operator as a source.
Scenarios when parallelism is lost for table functions:
If you set this parameter to true, Warehouse Builder generates code for analyzing the target table after the target is loaded if the resulting target table only is double or half its original size.
Set this parameter to true to improve performance for mappings that include the Splitter operator and inserts into multiple target tables. When this parameter is set to true and the mapping is executed by Oracle9i, Warehouse Builder generates a single SQL statement (multi_table_insert) that inserts data into multiple tables based on same set of source data.
Warehouse Builder performs the multiple table insert only if this parameter is set to true and the Oracle target module database is Oracle9i. The multiple tables insert is performed only for mappings in set based mode that include a Splitter operator and no active operators such as an Aggregator or Joiner operator between the Splitter and the target. Also, the multiple insert is available only for tables, not views, materialized views, dimensions, or cubes. Each target table must have fewer than 999 columns. For detailed instructions on how to create a mapping with multiple targets, see "Example: Creating Mappings with Multiple Targets".
Set this parameter to false for mappings run in row based mode or for mappings executed by Oracle8i server. You may also want to set this parameter to false when auditing is required. When set to true, Warehouse Builder returns one total SELECT and INSERT count for all targets.
This applies to mappings with multiple targets. When set to true, Warehouse Builder commits data based on the selected records. When set to false, Warehouse Builder commits data based on target by target processing (insert, update, delete).
The mapping behavior varies according to the operating mode you select. For more information about correlated commit, see "Committing Data from a Single Source to Multiple Targets"
The Configuration Properties dialog displays all source and target operators in a mapping under the Sources and Targets. The properties that are available for configuration vary according to the types of source and target operators used in the mapping.
For relational and dimensional sources and targets such as tables, views, and cubes, Warehouse Builder displays the following set of properties for each operator:
When Warehouse Builder generates code, it identifies each source and target operator by its bound name. By default, the bound name is the same name as the operator name.
This parameter maintained for backwards compatibility only.
In previous version of Warehouse Builder, you could link the mapping to a particular schema by clicking on the Schema field and typing a name.
This parameter maintained for backwards compatibility only.
In previous version of Warehouse Builder, you could select a database link by name from the drop-down list. Source operators can be configured for schemas and database links, but targets can be configured for schemas only. Sources and targets can reside in different schemas, but they must reside in the same database instance.
Set this parameter to true to enable partition exchange loading (PEL) into a target table. For information on how to design mappings for PEL, see "Using Partition Exchange Loading"
Define extraction or loading hints. Application developers often develop insights into their data. For example, they know that a query runs much faster if a set of tables is joined in one order rather than another. Warehouse Builder can incorporate these insights into the generated SQL code packages as SQL Optimizer Hints.
When you select a hint from the Hints dialog, the hint appears in the Existing Hints field. Type additional text as appropriate in the Extra Text column. The editor includes the hint in the mapping definition as is. There is no validation or checking on this text.
For information on optimizer hints and how to use them, see Oracle9i Designing and Tuning for Performance.
Configure the following Constraint Management parameters:
When you disable constraints, loading time is decreased since constraint checking is not performed. However, if exceptions occur for any rows during re-enabling, the constraints for those rows will remain in a non-validated state. These rows are logged in the runtime audit error table by their ROWID. You must manually inspect the error rows to take any necessary corrective action.
Setting the Enable Constraints to false is subject to the following restrictions:
For SQL and PL/SQL loading, if you do not specify an exceptions table, invalid rows load into a temporary table located in the default tablespace and then load into the Runtime Audit error table. The table is dropped at the end of the load.
If you are using SQL*Loader direct path loading, you must specify an exception table. Consult the SQL*Loader documentation for more information.
The Configuration Properties dialog contains additional settings for Mapping Flat File operators, depending on how the operators are used in the mapping.
Or, right-click the mapping you want to configure and select Configure from the pop-up menu.
Warehouse Builder displays the Configuration Properties dialog as shown in Figure 11-2.
For each parameter, you can either select an option from a list, type a value, or click ... to display another properties dialog.
The Configuration Properties dialog appears as shown in Figure 11-3.
For each parameter, you can either select an option from a list, type a value, or click ... to display another properties dialog.
Type a complete path name for the file in the syntax of the operating system on which the SQL*Loader script is deployed. The value you provide is placed in the INFILE clause. If you enter a value enclosed in single quote, the generated code will be LOAD. CONTINUE_LOAD is used when a direct load of multiple tables is discontinued and needs to be restarted. It is used in conjunction with the operator-level SKIP option.
If SQL*Loader runs out of space for data rows or index entries, the load is discontinued. If Continue Load is set to true, Warehouse Builder attempts to continue discontinued loads.
Perform Parallel Load: Specifies the value of the PARALLEL option as either =TRUE or =FALSE. True indicates that direct loads can operate in multiple concurrent sessions.
Errors Allowed: If the value specified is greater than 0, then the ERRORS = n option is generated. SQL*Loader terminates the load at the first consistent point after it reaches this error limit.
Records To Skip: If the value specified is greater than 0, then the SKIP = n option is generated. This value indicates the number of records from the beginning of the file that should not be loaded. If the value is not specified, no records are skipped.
Records To Load: If the value specified is greater than 0, then the LOAD = n option will be generated. This value specifies the maximum number of records to load. If a value is not specified all of the records are loaded.
Rows Per Commit: If the value specified is greater than 0, then the ROWS = n option is generated. For direct path loads, the value identifies the number of rows to read from the source before a data is saved. For conventional path loads, the value specifies the number of rows in the bind array.
Read Size: If the value specified is greater than 0, then the READSIZE = n option is generated. The value is used to specify the size of the read buffer.
Bind Size: If the value specified is greater than 0, then the BINDSIZE = n option is generated. The value indicates the maximum size in bytes of the bind array.
Read Buffers: If the value specified is greater than 0, then the READBUFFERS n clause is generated. READBUFFERS specifies the number of buffers to use during a direct path load. Do not specify a value for READBUFFERS unless it becomes necessary.
Preserve Blanks: If this parameter is set to TRUE, then the PRESERVE BLANKS clause is generated. PRESERVE BLANKS retains leading white space when optional enclosure delimiters are not present. It also leaves the trailing white space intact when fields are specified with a predetermined size.
Database File Name: This parameter enables you to specify the characteristics of the physical files to be loaded. The initial values for these parameters are set from the properties of the flat file used in the mapping.
If this parameter is set to a non-blank value, then the FILE= option is generated. The value specified is enclosed in single quotes in the generated code.
Control File Location and Control File Name: The control file name necessary for audit details.
For more information on each SQL*Loader option and clause, see Oracle9i Database Utilities.
Audit: Perform audit when the step is executed.
Default Purge Group: The Default Purge Group is used when executing the package. Each audit record in the runtime schema is assigned to the purge group specified.
This section discusses various strategies for configuring mappings. Use this section to determine which configuration settings are appropriate for a given scenario.
This section includes the following topics:
For mappings with a PL/SQL implementation, select one of the following operating modes:
The default operating mode you select depends upon the performance you expect, the amount of auditing data you require, and how you design the mapping. Mappings have at least one and as many as three valid operating modes, excluding the options for failing over to row based modes. During code generation, Warehouse Builder generates code for the specified default operating mode as well as the unselected modes. Therefore, at runtime, you can select to run in the default operating mode or any one of the other valid operating modes.
The types of operators in the mapping may limit the operating modes you can select. As a general rule, mappings run in set based mode can include any of the operators except for Match-Merge, Name-Address, and Transformations used as procedures. Although you can include any of the operators in row based and row based (target only) modes, there are important restrictions on how you use SQL based operators such as Aggregators, Joins, and Key Lookups. To use SQL based operators in either of the row based modes, ensure that the operation associated with the operator can be included in the cursor.
These general rules are explained in the following sections. For more information on how to use operators, see "Avoiding Invalid Designs for PL/SQL Mappings"
In set based mode, Warehouse Builder generates a single SQL statement that processes all data and performs all operations. Although processing data as a set improves performance, the auditing information available is limited. Runtime Auditing is limited to reporting to the execution error only. In set based mode, you cannot view details on which rows contain errors.
Table 11-4 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in set based operating mode. TAB1, FLTR, and TAB2 are processed as a set using SQL.
To correctly design a mapping for the set based mode, avoid operators that require row by row processing such as Match-Merge and Name-Address operators. If you include an operator in the dataflow that cannot be performed in SQL, Warehouse Builder does not generate set based code and issues an error when you execute the package in set based mode.
For target operators in a mapping, set their loading types to either INSERT/UPDATE or UPDATE/INSERT. Warehouse Builder does not support UPDATE or DELETE loading in set based mode. For a complete listing of how Warehouse Builder handles operators in set based mappings, see Table 11-3
In row based mode, Warehouse Builder generates statements that process data row by row. The select statement is in a SQL cursor. All subsequent statements are PL/SQL. You can access full runtime auditing information for all operators performed in PL/SQL and only limited information for operations performed in the cursor.
Table 11-5 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in row based operating mode. TAB1 is included in the cursor and processed as a set using SQL. FLTR and TAB2 are processed row by row using PL/SQL.
If the mapping includes any SQL based operators, which cannot be performed in PL/SQL, Warehouse Builder attempts to generate code with those operations in the cursor. To generate valid row based code, design your mapping such that , if you include any of the following SQL based operators, Warehouse Builder can include the operations in the cursor:
In order for the above operators to be included in the cursor, do not directly precede it by an operator that generates PL/SQL code. In other words, you can not run the mapping in row-based mode if it contains a Transformation implemented as procedure, a Flat File used as a source, a Match-Merge, or Name-Address operator directly followed by any of the seven SQL based operators. For the design to be valid, include a staging table between the PL/SQL generating operator and the SQL based operator. For more information about how to design mappings, see "Avoiding Invalid Designs for PL/SQL Mappings"
In row based (Target Only) mode, Warehouse Builder generates a cursor select statement and attempts to include as many operations as possible in the cursor. For each target, Warehouse Builder inserts each row into the target separately. You can access full runtime auditing information for all operators performed in PL/SQL and only limited information for operations performed in the cursor. Use this mode when you expect fast set based operations to extract and transform the data but need extended auditing for the loading the data, which is where errors are likely to occur.
Table 11-6 shows a simple mapping and the associated logic Warehouse Builder uses to generate code for the mapping when run in row based (target only) operating mode. TAB1 and FLTR are included in the cursor and processed as a set using SQL. TAB2 is processed row by row.
Row based (target only) places the same restrictions on SQL based operators as the row based operating mode. Additionally, for mappings with multiple targets, Warehouse Builder generates code with a cursor for each target.
If you want to populate multiple targets based on a common source, you may also want to ensure that every row from the source is represented correctly across all targets.
Figure 11-7 shows a mapping that illustrates this case. The target tables all depend upon the source table. For every row added to TARGET_1, one row should be added to the other targets. If this relationship is not maintained when you reload data, the data becomes inaccurate and possibly unusable.
If the number of rows from SOURCE is relatively small, maintaining the three targets may not be difficult. Manually maintaining targets dependent on a common source, however, becomes more tedious as you increase the number of rows from the source, or as you design more complex mappings with more targets and transformations.
To use Warehouse Builder to ensure that every row in the source is properly represented in every target, configure the mapping to use a correlated commit strategy.
When you assign a commit strategy for a mapping in Warehouse Builder, you determine the commit and rollback behaviors for the multiple targets dependent on a single source. You can choose between the following strategies:
The combination of the commit strategy and operating mode determines mapping behavior. Table 11-1 shows the valid combinations you can select.
| Operating Mode | Correlated Commit | Independent Commit |
|---|---|---|
|
Set based |
Valid |
Valid |
|
Row based |
Valid |
Valid |
|
Row based (target only) |
Not Recommended |
Valid |
Correlated commit is not recommended for row based (target only). By definition, this operating mode places the cursor as close to the target as possible. In most cases, this results in only one target for each select statement and negates the purpose of committing data to multiple targets. If you design a mapping with the row based (target only) and correlated commit combination, Warehouse Builder runs the mapping but, in most cases, does not perform the correlated commit.
To understand the affects each operating mode and commit strategy combination has on a mapping, consider the mapping from Figure 11-7Assume the data from source table equates to 1,000 new rows. When the mapping runs successfully, Warehouse Builder loads 1,000 rows to each of the targets. If the mapping fails to load the 100th new row to Target_2, you can expect the following results, ignoring the influence from other configuration settings such as Commit Frequency and Number of Maximum Errors:
Correlated commit may impact the size of your rollback segments. Space for rollback segments may be a concern when you merge data (insert/update or updated/insert) in set based mode.
Correlated commit operates transparently with PL/SQL bulk processing code.
When you use Correlated Commit to ensure all source rows are represented in multiple targets, consider using the Set based fail over to row based operating mode. In this way, you can find a balance between performance and error handling.
The correlated commit strategy is not available for mappings run in any mode that are configured for Partition Exchange Loading or include an Advanced Queue, Match-Merge, or Table Function operator.
Warehouse Builder generates code for PL/SQL mappings that meet the following criteria:
As you design a mapping, you can evaluate its validity by taking note of the input and output code types for each operator in the mapping. For example, you can see that the mapping in Figure 11-8 is invalid because the Match-Merge operator MM generates PL/SQL output but the subsequent Join operator accepts SQL input only.
To achieve the desired results for the mapping, consider joining the source tables before performing the Match-Merge, as shown in Figure 11-9, or loading the results from the Match-Merge to a staging table before performing the join, as shown in Figure 11-10.
Table 11-3 and Table 11-3 list the implementation types for each Warehouse Builder operator. These tables also indicate whether or not PL/SQL code includes the operation associated with the operator in the cursor. This information is relevant in determining which operating modes are valid for a given mapping design. It also determines what auditing details are available during error handling.
Data partitioning can improve performance when loading or purging data in a target system. This performance feature is referred to as Partition Exchange Loading (PEL).
PEL is recommended when loading a relatively small amount of data into a target containing a much larger volume of historical data. The target can be a table, a dimension, or a cube in a data warehouse.
This section includes the following topics:
By manipulating partitions in your target system, you can use Partition Exchange Loading (PEL) to instantly add or delete data. When a table is exchanged with an empty partition, new data is added.
You can use PEL to load new data by exchanging it into a target table as a partition. For example, a table that holds the new data assumes the identity of a partition from the target table and this partition assumes the identity of the source table. This exchange process is a DDL operation with no actual data movement. Figure 11-11 illustrates this example.
In Figure 11-11, data from a source table Source is inserted into a target table consisting of four partitions (Target_P1, Target_P2, Target_P3, and Target_P4). If the new data needs to be loaded into Target_P3, the partition exchange operation only exchanges the names on the data objects without moving the actual data. After the exchange, the formerly labeled Source is renamed to Target_P3, and the former Target_P3 is now labeled as Source. The target table still contains four partitions: Target_P1, Target_P2, Target_P3, and Target_P4. The partition exchange operation available in Oracle9i completes the loading process without data movement.
Warehouse Builder displays the Configuration Properties window as shown in Figure 11-12.
When you use Warehouse Builder to load a target by exchanging partitions, you can load the target indirectly or directly.
If you design a mapping using PEL and it includes remote sources or a join of multiple sources, Warehouse Builder must perform source processing and stage the data before partition exchange can proceed. Therefore, configure such mappings with Direct PEL set to False. Warehouse Builder transparently creates and maintains a temporary table that stores the results from source processing. After performing the PEL, Warehouse Builder drops the table.
Figure 11-13 shows a mapping that joins two sources and performs an aggregation. If all new data loaded into the ORDER_SUMMARY table is always loaded into same partition, then you can use Indirect PEL on this mapping to improve load performance. In this case, Warehouse Builder transparently creates a temporary table after the Aggregator and before ORDER_SUMMARY.
Warehouse Builder creates the temporary table using the same structure as the target table with the same columns, indexes, and constraints. For the fastest performance, Warehouse Builder loads the temporary table using parallel direct-path loading INSERT. After the INSERT, Warehouse Builder indexes and constrains the temporary table in parallel.
Use Direct PEL when the source table is local and the data is of good quality. You must design the mapping such that the source and target are in the same database and have exactly the same structure. The source and target must have the same indexes and constraints, the same number of columns, and the same column types and lengths.
For example, assume that you have the same mapping from Figure 11-13 but would like greater control on when data is loaded into the target. Depending on the amount of data, it could take hours to load and you would not know precisely when the target table would be updated.
To instantly load data to a target using Direct PEL:Design the staging table to exactly match the structure of the final target that you will load in a separate mapping. For example, the staging table in Figure 11-13 is ORDER_SUMMARY and should be of the same structure as the final target, ORDER_CUBE in Figure 11-14.
You can use PEL effectively for scalable loading performance if the following conditions are true:
To configure targets in a mapping for PEL:
Warehouse Builder does not automatically create partitions during runtime. You must create all partitions before you can use PEL. See "Creating Partitions".
For example, if you select Month as the frequency of new data collection, you need to create all the required partitions for each month of new data. Use the Configuration Properties window to create partitions for a table, dimension, or cube. Figure 11-15 shows the property inspector window for table ORDER_SUMMARY. This figure shows six partitions that have been added for this table.
To use PEL, all partition names must follow a naming convention. For example, for a partition that will hold data for May 2002, the partition name must be in the format Y2002_Q2_M05.
For PEL to recognize a partition, its name must fit one of the following formats.
Ydddd
Ydddd_Qd
Ydddd_Qd_Mdd
Ydddd_Qd_Mdd_Ddd
Ydddd_Qd_Mdd_Ddd_Hdd
Ydddd_Qd_Mdd_Ddd_Hdd_Mdd
Where d represents a decimal digit. All the letters must be in upper case. Lower case is not recognized.
If all partitions are added with correct names, Warehouse Builder automatically computes the Value Less Than property for each partition. Otherwise, the Value Less Than is must be manually configured for each partition in order for Warehouse Builder to generate a DDL statement. The following is an example of a DDL statement generated by Warehouse Builder:
. . . PARTITIONA_PARTITION_NAMEVALUES LESS THAN (TO_DATE(`01-06-2002','DD-MM-YYYY')), . . .
Figure 11-16 shows automatically generated configuration values for the Value Less Than parameter.
Figure 11-17 shows an index (ORDER_SUMMARY_PK_IDX) added to the ORDER_SUMMARY table. This index has two columns, ORDER_DATE and ITEM_ID. Configure the following:
Now Warehouse Builder can generate a DDL statement for a unique local index on table ORDER_SUMMARY.
Using local indexes provides the most important PEL performance benefit. Local indexes require all indexes to be partitioned in the same way as the table. When the temporary table is swapped into the target table using PEL, so are the identities of the index segments.
If an index is created as a local index, the Oracle server requires that the partition key column must be the leading column of the index. In the above example, the partition key is ORDER_DATE and it is the leading column in the index ORDER_SUMMARY_PK_IDX.
In this step you must specify that all primary key and unique key constraints are created with the USING INDEX option. Figure 11-18 shows an example where the primary key constraint ORDER_SUMMARY_PK on the ORDER_SUMMARY table is specified with the USING INDEX option.
With the USING INDEX option, a constraint will not trigger automatic index creation when it is added to the table. The server will search existing indexes for an index with same column list as that of the constraint. Thus, each primary or unique key constraint must be backed by a user-defined unique local index. The index required by the constraint ORDER_SUMMARY_PK is ORDER_SUMMARY_PK_IDX which was created in "Step 2: Create All Indexes Using the LOCAL Option".
These are the restrictions for using PEL in Warehouse Builder:
Warehouse Builder displays the Configuration Properties sheet for the process flow as shown in Figure 11-19.
This setting governs the behavior for activities that return NUMBER in their output. These activities include the FTP, External Process, and Transformation activities. When you set Use Return as Status to true, the Process Flow Editor assigns the outgoing transition conditions based on the following numerical return values for the activity:
1 = Success Transition
2 = Warning Transition
3 = Error Transition
Execution Location: The location from which this activity is executed. If you configured Oracle Enterprise Manager, you can select an OEM agent to execute the process flow.
Remote Location: The remote location for FTP activities only.
Working Location: The working location for FTP, FILE EXISTS and External Process activities only.
Deployed Location: The deployment location. This setting applies to transformation activities only. For activities referring to pre-defined transformations, you must change the setting from Use Default Location and specify a valid location.
|
|
![]() Copyright © 2001, 2003 Oracle Corporation. All Rights Reserved. |
|