| Oracle9i Warehouse Builder User's Guide Release 9.2 Part Number B10996-01 |
|
This chapter contains additional information about using the Name and Address operator, discussed in Chapter 8, "Using Mapping Operators". This chapter contains the following topics:
Input roles indicate what kind of name or address information resides in a line of data. Each input attribute in the Name and Address operator must have an input role that most closely matches the data contained in the source attribute. Input roles can either be non-discrete, or line oriented, input roles for free-form data (such as `Line1'); or they can be discrete roles for specific input attributes (such as `Person', `First Name', `Primary Address', or `City'). Discrete roles give the Name and Address operator more information about the content of the source attribute.
Table 20-1 lists input roles and descriptions for the Name and Address Operator.
Each output component represents a discrete name or address entity, such as a title, a standardized first name, a street number, a street name, or a street type. An output component is assigned to the corresponding output attribute, and identifies the portion of a name or address that the attribute constitutes. Table 20-2 lists the Name and Address operator output components and their descriptions.
| Parent Node(s) | Output Component | Description |
|---|---|---|
|
Name |
Pre NameFootref 1 |
Title or salutation appearing before a name; for example, Ms. or Dr. |
|
Name |
First Name Standardized1 |
Standard version of first name; for example, Theodore for Ted or James for Jim. |
|
Name |
Middle Name Standardized1 |
Standardized version of the middle name; for example, Theodore for Ted or James for Jim. Used when there is only one middle name, or for the first of several middle names (such as "Herbert" in George Herbert Walker Bush). |
|
Name |
Middle Name 2 Standardized1 |
Standardized version of the second middle name; for example, Theodore for Ted or James for Jim. |
|
Name |
Middle Name 3 Standardized1 |
Standardized version of the third middle name; for example, Theodore for Ted or James for Jim. |
|
Name |
Post Name1 |
Name suffix indicating generation; for example, Sr., Jr., or III. |
|
Name |
Other Post Name1 |
Name suffix indicating certification, academic degree, or affiliation; for example, Ph.D., M.D., or R.N. |
|
Name |
Name Designator1 |
Personal name designation; for example, "ATTN" (to the attention of) or "C/O" (care of). |
|
Name |
Relationship1 |
Information related to another person; for example, "Trustee for." |
|
Name |
Person1 |
First name, middle name, and last name. |
|
Name: Person |
First Name1 |
The first name found in the input name. |
|
Name: Person |
Middle Name1 |
Middle name or initial. Use this for a single middle name, or for the first of several middle names (such as "Herbert" in George Herbert Walker Bush). |
|
Name: Person |
Middle Name 21 |
Second middle name; for example, "Walker" in George Herbert Walker Bush. |
|
Name: Person |
Middle Name 31 |
Third middle name; for example, "Louise" in Ethel May Roberta Louise Mertz. |
|
Name: Person |
Last Name1 |
Last name, or surname. |
|
Name: Derived |
Gender1 |
Probable gender: |
|
Name: Derived |
Person Count |
Number of persons the record references. For example, a record with a Person name of "John and Jane Doe" has a Person Count of 2. |
|
Name: Business |
Firm Name1 |
Name of the company or organization, including divisions. |
|
Name: Business |
Firm Count1 |
Number of firms referenced in the record. |
|
Address |
Address2 |
Full address line, including: |
|
Address |
Primary Address2 |
Box, route, or street address, including:
This does not include the Unit Designator or the Unit Number. |
|
Address: Primary Address |
Street Number2 |
Number that identifies the address, such as a house or building number, sometimes referred to as the primary range. For example, in 200 Oracle Parkway, the Street Number is 200. |
|
Address: Primary Address |
Pre Directional2 |
Street directional indicator appearing before the street name; for example, in 100 N University Drive, the Pre Directional is `N'. |
|
Address: Primary Address |
Street Name2 |
Name of street. |
|
Address: Primary Address |
Street Type2 |
Street identifier; for example, ST, AVE, RD, DR, or HWY. |
|
Address: Primary Address |
Post Directional2 |
Street directional indicator appearing after the street name; for example, in 100 15th Ave. S., the Post Directional is `S'. |
|
Address |
Secondary Address2 |
The second part of the street address, including: For example, in a secondary address of Suite 2100, the Unit Designator is `STE' (a standardization of "Suite") and the Unit Number is `2100'. |
|
Address: Secondary Address |
Unit Designator2 |
Type of secondary address, such as APT or STE. For example, in a secondary address of Suite 2100, the Unit Designator is `STE' (a standardization of "Suite").
|
|
Address: Secondary Address |
Unit Number2 |
Number that identifies the secondary address, such as the apartment or suite number. For example, in a secondary address of Suite 2100, the Unit Number is `2100'. |
|
Address |
Last Line |
Final address line, including: |
|
Address: Last Line |
Neighborhood |
Neighborhood or barrio, common in South and Latin American addresses. |
|
Address: Last Line |
City |
Name of city. The US city names may be converted to United States Postal Service preferred names. |
|
Address: Last Line |
State |
Name of state or province. |
|
Address: Last Line |
Postal Code |
Full postal code with spaces and other non-alphanumeric characters removed. |
|
Address: Last Line |
Postal Code Formatted |
Formatted version of postal code that includes spaces and other non-alphanumeric characters, such as dashes. |
|
Address: Last Line |
Delivery Point |
Applies to United States and Australia. |
|
Address: Last Line |
Country Code |
The ISO 3166-1993 (E) two-character country code, as defined by the International Organization for Standardization; for example, `US' for United States or `CA' for Canada. |
|
Address: Last Line |
Country Code 3 |
The ISO 3166-1993 (E) three-character country code, as defined by the International Organization for Standardization; for example, `USA' for United States, `FRA' for France, or `UKR' for Ukraine. |
|
Address: Last Line |
Country Name |
The full country name. |
|
Address: Other Address Line |
Box Name2 |
The name for a post office box address; for example, for "PO Box 95," the Box Name is `PO BOX'. |
|
Address: Other Address Line |
Box Number2 |
The number for a post office box address; for example, for "PO Box 95," the Box Number is `95'. |
|
Address: Other Address Line |
Route Name2 |
Route name for a rural route address. For an address of "Route 5 Box 10," the Route Name is `RTE' (a standardization of "Route"). |
|
Address: Other Address Line |
Route Number2 |
Route number for a rural route address. For an address of "Route 5 Box 10," the Route Number is `5'. |
|
Address: Other Address Line |
Building Name |
Building name, such as "Cannon Bridge House." Building names are common in the United Kingdom. |
|
Address: Other Address Line |
Complex |
Building, campus, or other complex. For example, USS John F. Kennedy, Shadow Green Apartments, Cedarvale Gardens, Concordia College. You can use an the Instance field in the Output Components dialog to specify which complex should be returned in cases where an address has more than one complex. |
|
Address: Other Address Line |
Miscellaneous Address |
Miscellaneous address information, such as a telephone number or an e-mail address. In records with multiple miscellaneous fields, you can extract several by specifying which instance to use in the Name and Address operator Output Components screen. For example, in a record that has both an e-mail address and a telephone number, you can extract both items of information by using Miscellaneous Address 1 and Miscellaneous Address 2. |
|
Error Status: Name and Address |
Is Good Group |
Possible values: T or F. Indicates whether the name group, address group, or name and address group was processed successfully:
Using this flag in conjunction with another flag, such as the Is Parsed Flag, followed by the Splitter operator, enables you to isolate unsuccessfully parsed records in their own target, where you can address them separately. |
|
Error Status: Name and Address |
Is Parsed |
Indicates whether the name or address was parsed:
Check the status of parsing warning flags (such as Name Warning, City Warning, etc.). |
|
Error Status: Name and Address |
Parse Status |
Postal matching software parse status code. |
|
Error Status: Name and Address |
Parse Status Description |
Text description of the postal matching software parse status. |
|
Error Status: Name Only |
Is Good Name |
Indicates whether the name was parsed successfully: |
|
Error Status: Name Only |
Name Warning |
Indicates whether the parser found unusual or possibly erroneous data in a name:
|
|
Error Checking: Address Only |
Is Good Address |
Indicates whether the address was processed successfully:
This flag is easier to use if you have a mix of records from both postal matched and non postal matched countries. |
|
Error Checking: Address Only |
Is Found |
Indicates whether the address is listed in the postal matching database for the country indicated by the address:
This flag is true only if all of the flags listed below are true. If postal matching is available, this flag is the best indicator of record quality. |
|
Error Checking: Address Only: Is Found |
City Found |
Indicates whether the postal matcher found the city: |
|
Error Checking: Address Only: Is Found |
Street Name Found |
Indicates whether the postal matcher found the street name: |
|
Error Checking: Address Only: Is Found |
Street Number Found |
Indicates whether the postal matcher found the street number within a valid range of numbers for the named street: |
|
Error Checking: Address Only: Is Found |
Street Components Found |
Indicates whether the postal matcher found the street components, such as the Pre Directional or Post Directional: |
|
Error Checking: Address Only: Is Found |
Non-ambiguous Match Found |
Indicates whether the postal matcher found a matching address in the postal database:
|
|
Error Checking: Address Only |
City Warning |
Indicates whether the parser found unusual or possibly erroneous data in a city: |
|
Error Checking: Address Only |
Street Warning |
Indicates whether the parser found unusual or possibly erroneous data in a street address: |
|
Error Checking: Address Only |
Is Address Verifiable |
Indicates whether postal matching is available for the country of the address, but does not indicate the outcome of the matching operation:
|
|
Error Checking: Address Only |
Address Corrected |
Indicates whether the address was corrected in any way during matching. Standardization is not considered correction in this case. |
|
Error Checking: Address Only: Address Corrected |
Postal Code Corrected |
Indicates whether the postal code was corrected during matching. Correction can include the addition of a postal extension: |
|
Error Checking: Address Only: Address Corrected |
City Corrected |
Indicates whether the city name was corrected during matching. Postal code input is used to determine the city name preferred by the postal service. |
|
Error Checking: Address Only: Address Corrected |
Street Corrected |
Indicates whether the street name was corrected during matching. Some correct street names may be changed to an alternate name preferred by the postal service. |
|
Error Checking: Address Only: Address Corrected |
Street Components Corrected |
Indicates whether any of the street components, such as the Pre Directional or Post Directional, were corrected during matching: |
|
Error Checking: Address Only |
Address Type |
Type of address. The following are common examples; actual values vary with vendors of postal matching software:
|
|
Error Checking: Address Only |
Parsing Country |
Country parser that was used for the final parse of the record. |
|
Country Specific: United States |
ZIP 5 |
The five-digit United States postal code. This applies to United States addresses only. |
|
Country Specific: United States |
ZIP 4 |
The four-digit suffix that is added to the five-digit United States postal code to further specify location. This applies to United States addresses only. |
|
Country Specific: United States |
Urbanization Name |
Urban unit name used in Puerto Rico. This applies to United States (Puerto Rico) addresses only. |
|
Country Specific: United States |
LACS Flag |
Indicates whether the address requires a LACS conversion. The Locatable Address Conversion System (LACS) provides new addresses when a 911 emergency system has been implemented. 911 address conversions typically involve changing rural-style addresses to city-style addresses, but on occasion they may involve renaming or renumbering existing city-style addresses.
This applies to United States addresses only. |
|
Country Specific: United States |
CART |
Four-character USPS Carrier route. This applies to United States addresses only. |
|
Country Specific: United States |
DPBC Check Digit |
Check digit for forming a delivery point bar code. This applies to United States addresses only. |
|
Country Specific: United States: Geography |
Metropolitan Statistical Area |
Metropolitan Statistical Area (MSA) number. For example, `0000' indicates that the address does not lie within any MSA, and typically indicates a rural area. This applies to United States addresses only. |
|
Country Specific: United States: Geography |
Minor Census District |
Minor Census District. This applies to United States addresses only. |
|
Country Specific: United States: Geography |
Latitude |
Latitude in degrees north of the equator: positive for north of the equator; negative for south (always positive for North America). |
|
Country Specific: United States: Geography |
Longitude |
Longitude in degrees east of the Greenwich Meridian: positive for east of GM; negative for west (always negative for North America). |
|
Country Specific: United States: Geography |
FIPS County |
The three-digit county code as defined by the Federal Information Processing Standard (FIPS). This applies to United States addresses only. |
|
Country Specific: United States: Geography |
FIPS Code |
The complete (state plus county) code assigned to the county by the Federal Information Processing Standard (FIPS). Because FIPS county codes are unique within a state, a complete FIPS Code includes the two-digit state code followed by the three-digit county code. This applies to United States addresses only. |
|
Country Specific: United States: Geography |
Census ID |
United States Census tract and block-group number. The first six digits are the tract number; the final digit is the block-group number within the tract. These codes are used for matching to demographic-coding databases. This applies to United States addresses only. |
|
Country Specific: United Kingdom |
Locality Code |
This applies to United Kingdom addresses only. For example, the following address is assigned Locality Code 23591:
Chobham Rd |
|
Country Specific: United Kingdom |
Locality Name |
This applies to United Kingdom addresses only. For example, the following address is assigned Locality Name KNAPHILL:
Chobham Rd |
|
Country Specific: United Kingdom |
County Name |
This applies to United Kingdom addresses only. |
|
Country Specific: Canada |
Installation Type |
Type of Canadian postal installation: For example, for the address, "PO Box 7010, Scarborough ON M1S 3C6," the Installation Type is `STN'. |
|
Country Specific: Canada |
Installation Name |
Name of Canadian postal installation. For example, for the address, "PO Box 7010, Scarborough ON M1S 3C6," the Installation Name is `AGINCOURT'. |
|
Country Specific: Hong Kong |
Delivery Office Code |
This applies to Hong Kong addresses only. For example, the following address is assigned the Delivery Office Code WCH:
Oracle |
|
Country Specific: Hong Kong |
Delivery Beat Code |
This applies to Hong Kong addresses only. For example, the following address is assigned the Delivery Beat Code S06:
Oracle |
|
Country Specific: Hong Kong |
Address 2 |
The second address line, assigned when both a street address and a building or floor address are present. This applies to Hong Kong only. |
Table 20-3 lists supported countries for some vendors of name and address cleansing software. This list varies by vendor.
The postal matching software used by Oracle9i Warehouse Builder Name and Address can be certified for various countries. The certification depends on the vendor of the postal matching software you use. Possible certifications include:
The Coding Accuracy Support System (CASS) is a process developed by the United States Postal Service (USPS) in cooperation with the mailing industry. The process provides mailers a common platform to measure the quality of address-matching software, focusing on the accuracy of five-digit ZIP Codes, ZIP+4 Codes, delivery point codes, and carrier route codes applied to all mail. All address lists used to produce mailings for automation rates must be matched by CASS-certified software. Oracle9i Pure Name and Address is CASS-certified. The CASS report is a text file specified by the USPS and produced by Oracle9i Pure Name and Address. To meet USPS requirements, the mailer must submit a CASS report in its original form to the USPS.
Canada Post has developed a testing program called Software Evaluation and Recognition Program (SERP) which evaluates software packages for their ability to validate and/or validate and correct mailing lists to Canada Post requirements. Postal programs that meet SERP requirements are listed on the Canada Post web site.
Customers who utilize Incentive Lettermail, Addressed Admail and/or Publications Mail must meet the Address Accuracy Program requirements. Customers can obtain a Statement of Accuracy by comparing their database to Canada Post's address data.
The Address Matching Approval System (AMAS®) is a software approval program that has been developed by Australia Post to improve the quality of addressing. It provides a standard by which to test and measure the ability of address-matching software to:
AMAS allows companies to develop address matching software which:
PreSort Letter Service prices are conditional upon customers using AMAS Approved Software with Delivery Point Identifiers (DPIDs) being current against the latest version of the PAF.
A declaration that your mail was prepared appropriately must be made when using the Presort Lodgement Document, available from Post Offices.
You can only produce one postal report per mapping. The postal report created is for the primary country specified in the Name and Address operator definition. Ideally, all addresses being mapped should be located in the primary country selected in the Definitions page of the Name and Address operator.
The Name and Address operator generates PL/SQL code, which calls the UTL_NAME_ADDR package installed in the Runtime Schema. A private synonym, NAME_ADDR, is defined in the target schema to reference the UTL_NAME_ADDR package. The UTL_NAME_ADDR package calls Java packages, which send processing requests to an external Name and Address server, which then interfaces with third-party Name and Address processing libraries, such as Trillium.
You can use the server property file, NameAddr.properties, to configure server options. This file is located under the ORACLE_HOME of your Oracle9i Warehouse Builder Server Side Install at owb/bin/admin. The code below illustrates several important properties with their default settings.
TraceLevel=0 SocketTimeout=120 ClientThreads=16 Port=4040
The TraceLevel property is often changed to perform diagnostics on server communication and view output from the postal matching program parser. Other properties are rarely changed.
NASvrTrace.log in the owb/bin/admin folder. This file shows all incoming and outgoing data, verifies that your mapping is communicating with the Name and Address Server, and that the Name and Address Server is receiving output from the service provider. The trace log shows all server input and output and is most useful for determining whether any parse requests are being made by an executing mapping. Set TraceLevel=1 to enable logging. However, tracing degrades performance and creates a large log file. Set TraceLevel=0 to disable logging for production.
runtime_schema.nas_connection table to enable the utl_name_addr package to establish a connection.
Whenever you edit the properties file or perform table maintenance, you must stop and restart the Name and Address Server for the changes to take effect. You must stop and restart the server manually if the auto shutdown property is FALSE and the server is running when you edit the file or perform the table maintenance.
ORACLE_HOME/owb/bin/win32/NAStop.bat.
ORACLE_HOME/owb/bin/unix/NAStop.sh.
You can automatically restart the Name and Address Server by invoking a mapping in Warehouse Builder. You can also restart the server manually.
To manually restart the Name and Address Server:ORACLE_HOME/owb/bin/win32/NAStart.bat.
ORACLE_HOME/owb/bin/unix/NAStart.sh.
Oracle9i Warehouse Builder Name and Address is built on name and address software and data supplied by third-party software vendors who specialize in name and address cleansing. This section discusses additional considerations that pertain to these products, as well as general parsing issues.
Name and Address parsing and correction can provide great improvements in data quality, but can degrade quality if not applied carefully. Name and Address parsing, postal matching, or any cleansing of incorrect data is error-prone because the problem domain of dirty data is not bounded.
Name and Address parsing, like any other type of parsing, depends on identification of keywords and patterns containing those keywords. Unlike computer languages, which contain a small set of keywords, free form Name and Address data of any region is very difficult to parse because the keyword set is so large and the set is never 100% complete. For example, one name and address cleansing software provider's pattern table for USA contains over 80,000 defined keywords and over 5,000 patterns. Keyword sets are built by analyzing millions of records, but each new data set is likely to contain some undefined keywords.
Because most free-form Name and Address record lines contain common patterns of numbers, single letters, and alphanumeric strings, parsing can often be performed based on just the alphanumeric patterns. In more difficult cases, alphanumeric patterns may be ambiguous, with possible interpretation as either an address line or name line. In other cases, a particular pattern may not be found. With computer languages, compilation errors are reported and an executable image is not produced; with Name and Address parsing errors, parsing status codes are set and Name and Address elements may not be parsed.
Because failure of the parsing engine at some point is highly probable, perform all Name and Address processing with error processing in mind. Use status codes to control the data mapping. Since the criteria for quality vary between applications, numerous flags are available to help you determine the quality of a particular record. For countries with postal matching support, the Is Good Group flag is the best measure of quality because it verifies that an address is a valid entry in a postal database. For CASS or SERP certified mailings, this flag may be the best criterion for acceptance or rejection of a record.
If you do not perform postal reporting, an address does not have to be found in a postal database to be acceptable. For example, it may not be possible to locate street intersection addresses or addresses using building names may not be locatable in a postal database, but they may still be deliverable. If the parse status is good, some benefit will be derived from standardization of address elements. If the Is Good Group flag indicates postal matching failure, several parser warning/error flags are also available to help determine the parsing status. The Is Parsed flag indicates success or failure of the parsing process. If Is Parsed indicates parsing success, you may still wish to check the parser warning flags, which indicate unusual data. If parser warnings are present, it may be desirable to check those records manually. If Is Parsed indicates parsing failure, preservation of your original data is necessary to prevent data loss.
A wide range of input roles is available to match the granularity of input data. Line based input roles are available for line oriented input, and discrete roles are available for atomic attributes such as First Name. Whenever possible, use discrete roles because they give the parser more information about the data content, and result in better parsing. Some of the discrete input roles overlap less discrete roles; for example, the combination of city, state, and postal code redefines the Last Line role. Such redefining assignments should be avoided. Where they occur, the more discrete roles take precedence because they are more specific.
Map discrete Name and Address attributes whose original value you want to retain directly to the target (map them around the Name and Address operator). This is very common for first, middle, and last names because it is not a good practice to change a first name to a standardized first name (for example, changing Peggy to Margaret). This direct mapping prevents losing name data if a personal name is parsed as a firm name. To obtain a standardized first or middle name for matching and merging (data deduplication), you can map the same attributes into the Name and Address operator. However, only the recoded outputs, such as title, gender, or standardized name, are used as output from the operator. There is no advantage to using first, middle, or last name output because it mirrors the input. This recommendation only applies to discrete data where the first, middle, and last name is already known.
Consider using the splitter operator to map good records to one target and bad records to another target. If the percentage of bad records is very low, or if the data set is small, you may consider manual correction of bad records. You can flag each record manually corrected record for later mapping to the target schema. If the IS_GOOD_GRP flag is F and the records are mapped to a special target for manual correction, you can map additional components that indicate the success level of the postal match. Components are available to indicate matching success at the city, street name, street number range, or street component (for example, the pre-directional, post-directional, or street type) level.
|
|
![]() Copyright © 2001, 2003 Oracle Corporation. All Rights Reserved. |
|