Using the Product Catalog Database Loader - Updated November 05, 2001


	BEA Home \| Events \| Solutions \| Partners \| Products \| Services \| Download \| Developer Center \| WebSUPPORT
	http://www.oracle.com/technology/documentation/index.html \| Search \| PDF Files \| Contact \| Glossary

WLCS Documentation \| Product Catalog \| Previous Topic \| Next Topic \| Contents \| Index

Using the Product Catalog Database Loader

WebLogic Commerce Server provides a DBLoader program that you can use to bulk load data into the product catalog database. While you could use a WebLogic Commerce Server administration screen to add new item or category data, one record at a time, this is impractical when you need to load hundreds or thousands of records. The DBLoader program is also useful if you want to load legacy data from an existing database into the WebLogic Commerce Server database.

You can also use a database vendor's specific loader program such as Oracle SQL*Loader, or a data loader by a third-party company, to populate the product catalog database.

The topic includes the following sections:

The Input File for DBLoader

The WebLogic Commerce Server DBLoader program loads data that you provide in a text file into the product catalog database. Data is loaded one table at a time; create a separate input file for each table that you want to update.

The input data file is, by default, a comma-separated value (CSV) text file. The input file has the following structure:

First row: header containing the table name
Second row: field names for that table
Third row: data types for the fields listed on the second row
Fourth through N row: input data

First Row

The header of the file must identify:

The number of records to be loaded. DBLoader will use this number as a reference point only. It will process all the records in the file regardless of this indicator.
The name of the table to be loaded with data in the Catalog database.

For example, the header line might contain:

130,WLCS_PRODUCT

Second Row

The second row identifies the table field (column) names into which you are loading data. You must include the primary key field or fields in the input file. Preface each primary key field name with an asterisk (*). Apart from primary keys in tables, all other fields are defaulted as null. Thus, you may omit field names where NULL is an acceptable value, and specify only those with non-NULL values.

For example, the second line of the input data file might contain:

*SKU,NAME,IN_STOCK,EST_SHIP_TIME,SPECIAL_NOTES,CREATION_DATE

Third Row

The third row specifies the data type of each field being loaded. See The Product Catalog Database Schema, for information about the product catalog schema and the datatypes used.

For example, the third line of the input data file might contain:

VARCHAR,VARCHAR,CHAR,VARCHAR,VARCHAR,DATE

Notes: On the data type line of the input file, it is not necessary to include the length of the data type, such as VARCHAR(20) or VARCHAR2(20). Simply use VARCHAR for strings. Use NUMBER instead of (for example) NUMBER(16,4). Use DOUBLE instead of DOUBLE PRECISION.

Fourth Through N Rows

All subsequent lines in the input data file contain the data values. The following is an example of a simple input file:

3,WLCS_PRODUCT

*SKU,NAME,IN_STOCK,EST_SHIP_TIME,SPECIAL_NOTES,CREATION_DATE

VARCHAR,VARCHAR,CHAR,VARCHAR,VARCHAR,DATE

P123,CoolKid,N,Out of stock until further notice,Special order only,02-Oct-2000

P124,FastKid,Y,One week,No special order,02-Oct-2000

P125,RadSneakers,Y,,regular stock,02-Oct-2000

Note: DATE field values should always be entered in the format DD-MMM-YYYY. It cannot be an empty string. Its values are either null or a valid date.

Empty input strings from the data file are inserted into database as empty strings. You must account for each unspecified field in the input record by including the delimiter character (by default, a comma) in the correct position (matching the position of the fields you listed in line 2, the field names). For example:

P125,RadSneakers,Y,,regular stock,02-Oct-2000

In the previous example a value for the fourth identified field (EST_SHIP_TIME) was not specified. This condition is fine because this field is not a primary key for the database record. The field's value is stored as an empty string.

Note: If your intention is to store a null value in the database for a non-primary-key field, you should enter NULL in the correct position for the field in that record. Do not enclose NULL in quotes; enclosing the word `NULL' in quotes will cause the field to be stored as a string.

The dbloader.properties File

The WebLogic Commerce Server DBLoader program uses a properties file named dbloader.properties to decide what driver, database, or login to use.

This file resides in the WL_COMMERCE_HOME directory. WL_COMMERCE_HOME is the directory where you installed WebLogic Commerce Server.

Comment lines are prefixed with the # character. Both comment lines and blank lines are allowed.

The following table describes the values you can set in this property file.

Property Name	Default Value	Description
jdbcdriver	COM.cloudscape.core. JDBCDriver	Specify which JDBC driver to use to connect to your database. Default driver is Cloudscape JDBC driver that ships with WebLogic. Supported drivers for DBLoader program: COM.cloudscape.core.JDBCDriver oracle.jdbc.driver.OracleDriver weblogic.jdbc.oci.Driver weblogic.jdbc20.oci.Driver:
connection	jdbc:cloudscape:Commerce	Database name where loaded data should go. Commerce is the name of the default database that ships with WebLogic Commerce Server. Location of this database is specified by the system property: cloudscape.system.home.
dblogin	None	The database username. The default Cloudscape database does not require a username to be specified. The login name must have read/write privileges on the affected tables.
dbpassword	None	The database user password. The default Cloudscape database does not require a user password to be specified.
delimiter	,	You can change the recognized delimiter character that is used to separate values in the input data file. For example, this might be necessary if you use commas as punctuation in an item's Long Description (LONG_DESC). Choose another character, such as the circumflex (^) as a delimiter.
timestamptable	WLCS_CATEGORY, WLCS_PRODUCT	Identifies the product catalog database tables to which DBLoader will track updates (for these two tables). The field name is fixed in the schema provided by WebLogic Commerce Server. However, if you are using DBLoader for other tables (not WLCS tables), you can specify other field names of your own.
timestampfield	MODIFIED_DATE	Specifies the field in the WLCS_CATEGORY and WLCS_PRODUCT tables that identifies the last time this record in the table was modified. The value of the field specified is used by DBLoader to learn when the most recent update was made in each record in the product catalog tables identified in the timestamptable property. The field name is fixed in the schema provided by WebLogic Commerce Server. However, if you are using DBLoader for other tables (not WLCS tables), you can specify other field names of your own
commitTxn	50	Sets how many records are loaded before committing the updates in the database. If the value is less than or equal to one, DBLoader will commit after loading each record.
encoding	Not specified in the dbloader.properties file; therefore, the default is the Java 2 SDK's platform default.	Sets the multibyte character encoding type. The property value supplied can be UCS2 or UTF8. When writing data into and reading data out of the product catalog, Java will transparently convert from the native character encoding used by your systems and Unicode 2.0. There is nothing special that you must do. However, if you need to write/read data to/from the catalog that is encoded differently than your system's native encoding, you will have to explicitly perform the translation. For more information, see the section Important Database Considerations.

Listing 3-1 shows a sample dbloader.properties file.

Listing 3-1 Sample dbloader.properties File

jdbcdriver=COM.cloudscape.core.JDBCDriver

connection=jdbc:cloudscape:Commerce

dblogin=none

dbpassword=none

delimiter=,

timestamptable=wlcs_category

timestampfield=modified_date

encoding=UTF8

commitTxn=50

Running the DBLoader Program

You use the loaddata script to run the DBLoader program.

Depending on the platform you are using, the script is in one of the following directories:

WL_COMMERCE_HOME\bin\win32 (Windows)
WL_COMMERCE_HOME/bin/unix (UNIX)

The loaddata script performs the following:

Configures your environment for the duration of execution of this program
Specifies where to find the data input file
Launches the DBLoader program

Before you can run the loaddata script, make sure that the set-environment script specifies the same database as the dbloader.properties file. The set-environment script resides in the same directory as the loaddata script.

For example, if the dbloader.properties file uses `jdbc:cloudscape:Commerce' connections, then set-environment script should have SET DATABASE=CLOUDSCAPE.

As we mentioned earlier, DBLoader runs independently of WebLogic Commerce Server. Therefore you do not need to stop the server if you are planning to run the loader. However, if you are running WebLogic Commerce Server with a Cloudscape database, the database itself does not allow more than one connections at a time. In that case, you would need to stop the server.

If you are running WebLogic Commerce Server with Oracle, then the drawback might be a slower performance for the time the data is being loaded into the database.

Note: You might want to back up the particular tables that you are about to update before running DBLoader. The DBLoader program does not keep history records in the database.

To Run the Program

The command to run the program has the following format:

prompt> loaddata { -insert | -update | -delete } input-file.csv

On UNIX systems, the loaddata.sh file needs to have its default protections set to include execute privilege. A typical way to do this is with the command:

$ chmod +x loaddata.sh

You must select one of the three possible operations: -insert, -update, or -delete.

For example:

prompt> loaddata -update category.csv

In the previous example, the DBLoader program will update rows in the product catalog database that match the primary keys specified in the category.csv input file.

To insert, update, or delete data in several tables, run the loaddata script separately for each table, providing the corresponding input filename as a parameter. The order of tables being updated should use the same data integrity rules as all other SQL statements. For example, insert rows into the parent table with the primary key constraint before inserting rows into the child table with the foreign key constraint.

DBLoader Log Files

The WebLogic Commerce Server DBLoader creates two audit trail logs:

dbloader.log
dbloader.err

If these files do not already exist, they are created. Otherwise, the existing audit trails are overwritten by each DBLoader operation. Both files reside in the same directory where you run the loaddata script.

The dbloader.log file contains the following information:

The input filename, and the action taken: insert, update, or delete.
The number of records processed during the load operation.
The start and end time of the database load processing.

If any errors occurred during the attempted database load operation, the dbloader.err file captures the following information:

The input filename, and the action taken: insert, update, or delete.
The timestamp when the failure or exception occurred on the record.
The index of the failed data record in the input file.
The reason for the failure or exception and actual the input record's values.

DBLoader Validations

The DBLoader program checks the number of fields affected by the load (as specified in the second line of the input data file) against the number of input fields in each record. Because the field delimiter is a comma (by default), this character is not allowed in a string input field. If extra commas are supplied inadvertently, such as punctuation in a LONG_DESC (Long Description) field, an error will result and is noted in the dbloader.err file. To avoid this type of error, carefully check the number of commas you are using to separate the input data field values. Or select a different delimiter character and specify it in the dbloader.properties file. For more information, see the section The dbloader.properties File.

All errors and exceptions are displayed in the console where the DBLoader program is running. Records with errors in them will be skipped, and the processing continues until the end of the file. (The program does not roll back a transaction if an error has occurred.)

Important Database Considerations

This section describes some important database considerations that you should keep in mind while using the DBLoader program.

The schema for the product catalog enforces referential integrity between tables with the use of table constraints; for example, the primary key constraint on WLCS_PRODUCT and WLCS_CATEGORY, or the foreign key constraint on WLCS_PRODUCT_CATEGORY. These constraints determine the order in which data can be inserted, or updated, or deleted from these tables. If you are using an Oracle database, you might be less concerned about the order of delete operation because of Oracle's cascading delete functionality, an option that causes the deletion of a record from the primary table to automatically delete all related records in the related child table or tables.
Because Cloudscape does not provide this functionality, run the loaddata script to delete records from child tables before deleting records from primary tables.

For related information, please see The Product Catalog Database Schema.
All Strings in Java are represented as a series of Unicode 2.0 characters. Unicode 2.0 is a 16-bit character encoding that supports the world's major languages. Therefore, when reading text into and writing text out of the JVM, an encoding scheme must be used to convert the "native" encoding used by the operating system to or from Unicode 2.0. Data in text files is automatically converted to Unicode 2.0 when its encoding matches the default file encoding of the Java Virtual Machine (and that of the operating system).
You can identify the default file encoding by checking the System property named file.encoding, as follows:

System.out.println(System.getProperty("file.encoding"));

If the file.encoding property differs from the encoding of the text data you want to process, then you must perform the conversion yourself.

Currently, the Java 2 SDK 1.2.2 can convert several files encoding into Unicode 2.0. For details, please see http://java.sun.com/products/jdk/1.2/docs/guide/internat/encoding.doc.html.

What this means to your development group:

Using Database-Specific Data Loaders

Most database management systems provide a data loader utility. In the case of Oracle, the data load utility is known as SQL*Loader. This section summarizes the capabilities of SQL*Loader. For details about SQL*Loader, see the Oracle 8i Utilities Guide. An online copy is available at http://technet.oracle.com/.

The examples used in this section are based on a simple ASCII file containing a few comma-separated values (SKU, IN_STOCK, VISIBLE, NAME) taken from a sample WLCS_PRODUCT table, which is described in The Product Catalog Database Schema.

Note: The example shown in this section does not use all of the columns from WLCS_PRODUCT. Your actual comma-separate values (CSV) file may contain more columns than we show here. We are merely attempting to show you how to conduct the import operation once you have the CSV file ready.

The name we will use for our sample data file is sample.csv. The contents of that file might look like the following (based on the columns we mentioned earlier - SKU, IN_STOCK, VISIBLE, NAME).

"0,3,4",0,0,"Growing Herbs from Seed, Cutting, and Root"
"0,2,1",0,0,"The Perfect Storm: A True Story of Men Against the Sea"
"0,2,2",0,0,"The Worst-Case Scenario Survival Handbook"
"0,4,0",0,0,"Acute Asthma: Assessment and Management"
"0,4,1",0,0,"Communications Technology Explained"
"0,4,2",0,0,"Modern Plastics Handbook"

Once you have your data file ready to populate the WLCS_PRODUCT table, you must create a control file which will be used by SQL*Loader. The control file identifies the data file to be read in, how the pieces of information are delimited in the file, and, of course, the actual column locations of the destination table. We will name our control file sample.ctl.

LOAD DATA
INFILE `sample.csv'
APPEND INTO TABLE WLCS_RPODUCT
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
(SKU, IN_STOCK, VISIBLE, NAME)

At a system prompt, invoke SQL*Loader with a command such as:

sqlldr  userid=bea_systems/bea_systems  control=sample.ctl  log=sample.log

To review the results of your data load, see sample.log.

Using Third-Party Data Loaders

There are a variety of data loaders available on the market today to assist in the extraction and loading of information. Please be sure to research the use of these tools to ensure success within your environment.

DataStage by Informix
The Data Movement module of DataStage provides a comprehensive data extraction, transformation, and loading toolset designed for building Operational Data Stores (ODS), data marts and enterprise data warehouses. For more information, please see http://www.informix.com.
PowerConnect by Informatica
PowerConnect is Informatica's family of packaged software products that helps customers easily extract data and metadata from hard-to-access ERP and other legacy applications. This data is then delivered to PowerCenter, Informatica's data integration software hub, which provides robust capabilities for transforming the data and delivering it to downstream data warehouses, data marts and analytic applications. For more information, please see http://www.informatic.com.
Data Junction
Data Junction is a visual design tool for rapidly integrating and transforming data between hundreds of applications and structured data formats. For more information, please see http://www.datajunction.com.
ETI·EXTRACT by ETI
ETI·EXTRACT moves and integrates data across the value chain and multiple business processes. The product automates the writing of programs that retrieve the data needed from any system, transform it and load it into any other system while capturing a complete history of that process. For more information, please see http://www.eti.com.