This section provides information on how to use the Batch Loader utility to check in (insert), delete, and update a large number of files simultaneously on a Content Server instance.
This chapter covers these topics:
The Batch Loader utility can save you time and effort by automating the file batch loading process. The following are examples of when to use the Batch Loader:
You just purchased the WebCenter Content software, and you want check in all of your existing files with metadata that exists in a database.
You have documents checked in to the Content Server repository, and you just created a new custom metadata field. You can use the Batch Loader to add the values you specify for the new metadata field to each existing content item.
You want to remove a large number of specific files from the system.
Note:
For the Batch Loader utility to function correctly with an Oracle WebLogic Server instance, you must have JDBC connection settings configured. See Section 4.5.2.
The Batch Loader performs actions that are specified in a batch load file, which is a text file that tells the Batch Loader which actions to perform and what metadata to assign to each content item in the batch.
This section covers these topics:
A batch load file is made up of file records, which are sets of name/value pairs that specify the action to perform, or the metadata for individual content items, or both.
Important:
Field names and parameters are case sensitive. They must appear in the batch load file exactly as they appear in the following sections. For example, dDocName is not the same as ddocname, dDocname, or DDOCNAME.
Each file record ends with an <<EOD>>
(end of data) marker.
A pound sign (#) followed by a space at the beginning of a line indicates a comment. The comment character must be followed by a space. For example: # primaryFile=test.txt
works properly, but #primaryFile=test.txt
will cause errors.
The following is an example of a file record:
# This is a comment Action=insert dDocName=Sample1 dDocType=Document dDocTitle=Batch Load record insert example dDocAuthor=sysadmin dSecurityGroup=Public primaryFile=links.doc dInDate=8/15/2001 <<EOD>>
Valid actions for batch loading are Insert, Delete
, and Update
.
If no action is specified for a file, the system tries to perform an update.
Each file record can have only one action, but file records with different actions can be present in the same batch load file.
The logic process for each action is different.
The insert action checks a new file in to the Content Server repository. Figure 5-1 illustrates the insert action.
If the Content ID (dDocName) does not exist in the Content Server database, then a new file is created.
If the Content ID (dDocName) exists in the Content Server database, and no revision (dRevLabel) is specified, then a new revision is created.
If the Content ID (dDocName) and the specified revision (dRevLabel) exist in the Content Server database, then no action is performed.
Figure 5-1 The Insert Action Sequence for Checking In a New File
The following table defines the fields required for successful performance of an insert action.
Note:
Batch loaded revisions will not enter a workflow even if they meet the criteria for an active workflow.
Field Length: Maximum number of characters permitted in the field.
Carried Over: If the next record does not contain this field, the value of this field will be taken from the previous record.
Important:
If you have defined any custom metadata fields as required fields, those fields also need to be defined for an insert action.
Required Items | Field Length | Carried Over | Definition |
---|---|---|---|
Action=insert |
N/A |
Yes |
The command to insert a file. The term Action is case sensitive and must be initial capitalized. |
dDocName |
30 |
No |
The metadata field named Content ID. |
dDocType |
30 |
Yes |
The metadata field named Type. |
dDocTitle |
80 |
No |
The metadata field named Title. |
dDocAuthor |
30 |
Yes |
The metadata field named Author. |
dSecurityGroup |
30 |
Yes |
The metadata field named Security Group. |
primaryFile |
N/A |
N/A |
The metadata field named Primary File. The Primary File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:
By default, the length of the Primary File name cannot exceed 80 characters (of which the extension can only be 8 characters maximum). |
dInDate |
N/A |
No |
The metadata field named Release Date.
|
<<EOD>> |
N/A |
N/A |
Indicates the end of data for the file record. |
The following code fragments show the batch load file syntax for inserting files. This example shows two file records.
The first file record includes all required fields and the action statement, Action=insert
. The second file record does not list the required fields: dDocType, dDocAuthor, or dSecurityGroup. However, the information for these items is taken from the previous record. Also, the second record does not specify an action, so the insert action is carried over. Therefore, if the Content ID HR003
does not exist, the file will be inserted. However, if the Content ID does exist, it will not be inserted because the action is insert and not update.
First record:
Action=insert dDocName=HR001 dDocType=Form dDocTitle=New Employee Information Form dDocAuthor=Olson dSecurityGroup=Public primaryFile=hr001.doc dIndate=3/15/97 <<EOD>>
Second record:
dDocName=HR003 dDocTitle=Performance Review primaryFile=hr003.doc dIndate=3/15/97 <<EOD>>
The delete action deletes one or all revisions of an existing file from the Content Server repository. If the specified Content ID (dDocName) does not exist in the Content Server database, no action is performed. Figure 5-2 illustrates the delete action.
The following table defines the fields required for successful performance of a delete action.
Required Items | Definition |
---|---|
Action=delete |
The command to delete a file. The term Action is case sensitive and must be initial capitalized. |
dDocName |
The metadata field named Content ID. |
<<EOD>> |
Indicates the end of data for the file record. |
The following example shows the batch load file syntax for deleting files. This example shows two file records. The first file record will delete all revisions of the Content ID HR001
. The second file record will delete revision 2 of the content item HR002
.
Action=delete dDocName=HR001 <<EOD>> Action=delete dDocName=HR002 dRevLabel=2 <<EOD>>
The update action updates existing content items. One of the following actions occurs, depending on what items are present in the file record and what content exists in the system:
A new revision of an existing content item is created.
An existing file's metadata is updated.
A new content item is inserted (Action=insert
is performed).
Note:
Batch loaded revisions will not enter a workflow even if they meet the criteria for an active workflow.
A new revision is created when one of the following scenarios occur:
Scenario | Content ID (dDocName) | Revision (dRevLabel) | Release Date in Batch Load file (dInDate) |
---|---|---|---|
Scenario 1 |
Exists in Content Server instance |
Not specified in the batch load file. |
After the release date of the latest revision of the file in the system. |
Scenario 2 |
Exists in Content Server instance |
Specified in the batch load file, but does not exist in Content Server instance. |
After the release date of the latest revision of the file in the system. |
The following table defines the fields required for successful performance of an update action.
Required Items | Field Length | Carried Over | Definition |
---|---|---|---|
Action=update |
N/A |
Yes |
The command to update a file. The term Action is case sensitive and must be initial capitalized. |
dDocName |
30 |
No |
The metadata field named Content ID. |
dDocType |
30 |
Yes |
The metadata field named Type. |
dDocTitle |
80 |
No |
The metadata field named Title. |
dDocAuthor |
30 |
Yes |
The metadata field named Author. |
dSecurityGroup |
30 |
Yes |
The metadata field named Security Group. |
primaryFile |
N/A |
N/A |
The metadata field named Primary File. If only the metadata is being updated, the primaryFile field is not required but dRevLabel is required. If the optional dRevLabel field is specified and matches a revision label that exists in the Content Server instance, the primaryFile field is not required; the primary file specified for that revision is used. It is important to note that although dRevLabel is not a required field, if the primaryFile is not present, then dRevLabel becomes a required field. The Primary File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:
|
dInDate |
N/A |
No |
The metadata field named Release Date.
|
<<EOD>> |
N/A |
N/A |
Indicates the end of data for the file record. |
This example assumes that two files are already checked in to the system with the following metadata:
HR001 has a Release Date of 9/26/98 and Revision of 1
HR002 has a Release Date of 3/15/99 and Revision of 2
The first file record, Content ID HR001, exists in the system, but it does not have a Revision (dRevLabel) specified in the batch load file. Therefore, the Batch Loader will compare the Release Date of the latest revision in the system with the Release Date specified in the batch load file. Since 2/20/99 is after 9/26/98, a new revision 2 for HR001 is added.
The second file record, Content ID HR002, exists in the system and has a Revision (dRevLabel) specified, but Revision 3 does not exist in the system. Therefore, a new revision 3 for HR002 is added.
Action=update dDocName=HR001 dDocType=Form dDocTitle=New Employee Form dDocAuthor=Olson dSecurityGroup=Public primaryFile=hr001.doc DInDate=2/20/99 <<EOD>> dDocName=HR002 dDocTitle=Payroll Change Form primaryFile=hr002.doc DIndate=2/20/99 dRevLabel=3 <<EOD>>
This example assumes that one file is already checked in to the system with the following metadata:
Content ID = HR003
Release Date = 3/15/97
Revision = 1
Title = Performance Review
Author = Smith
Because Revision 1 of the Content ID HR003 exists in the system (and is not in an active workflow), the revision will be updated with the new Title, Author, and Release Date metadata.
Action=update dDocName=HR003 dDocType=Form dDocTitle=Performance Review Template dDocAuthor=Smith primaryFile=hr003.doc dIndate=2/20/99 dRevLabel=1 <<EOD>>
The following table lists the optional parameters you can use in any file record in a batch load file.
In a batch load file, there are two methods you can use to override the primary and alternate formats assigned to a content item check-in:
Specifying a value for the primaryFile:format parameter, or specifying a value for the alternateFile:format parameter, both. However, it is possible to override these values by using the primaryOverrideFormat or alternateOverrideFormat parameters. It is also possible that certain components will force specific formats on certain types of check-ins or certain application functionality may exist in some components that forces a different format.
Specifying a value for the primaryOverrideFormat parameter, or specifying a value for the alternateOverrideFormat parameter, or both. However, these will only work as parameters in the batch load file if you enable the IsOverrideFormat configuration variable. Note that using this method will override any values that you set for the primaryFile:format and alternateFile:format parameters.
Optional Parameters | Definition |
---|---|
dRevLabel |
The metadata field named Revision. Maximum field length is 10 characters. Values must be an integer or comply with the Major/Minor Revision Label Sequence established by the System Properties settings. |
dDocAccount |
The metadata field named Accounts. Maximum field length is 30 characters. This field is not carried over to the next file record. Do not specify this field if accounts are not enabled. If accounts are enabled and this field is not specified, dDocAccount will be set to an empty value. |
xComments |
The metadata field named Comments. Maximum field length is 255 characters. |
dOutDate |
The metadata field named Expiration Date. The dOutDate must use the date format of the locale of the user executing the Batch Loader. For example, the English-US date format is Time information is optional. If you specify the time, only the |
primaryFile:path |
Specifies the location of the file. If a primaryFile:path value is specified, the value overrides the value specified for the primaryFile parameter. However, the primaryFile:path value is not used to determine the file conversion format. If a value for primaryFile:path is not specified, the location is determined from the primaryFile value. This parameter uses the following syntax: primaryFile:path=complete_path/primary_file_name |
primaryFile:format |
Specifies the file format to use for the Primary File. This file format overrides the one specified by the file extension of the file and the value specified for the primaryFile parameter. If a primaryFile:format value is not specified, the file format is determined from the file extension for the primaryFile value. This parameter uses the following syntax: primaryFile:format=application/conversion_type |
alternateFile |
The metadata field named Alternate File. The Alternate File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows: If the SetFileDir optional parameter has been set in this file record or any previous file record, the directory specified in SetFileDir will be used. If the SetFileDir parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader window.) |
alternateFile:path |
Specifies the location of the alternate file. If an alternateFile:path value is specified, the value overrides the value specified for the alternateFile parameter. However, the alternateFile:path value is not used to determine the file conversion format. If an alternateFile:path value is not specified, the location is determined from the alternateFile parameter, if a value is specified. Otherwise, by default, the primaryFile value is used for the computation. This parameter uses the following syntax: alternateFile:path=complete_path |
alternateFile:format |
Specifies the file format to use for the Alternate File. This file format overrides the one specified by the file extension of the file and the value specified for the alternateFile parameter. If an alternateFile:format value is not specified, the file format is determined from the file extension for the alternateFile parameter, if a value is specified. Otherwise, by default, the primaryFile value is used for the computation. This parameter uses the following syntax: alternateFile:format=application/conversion_type |
webViewableFile |
The webViewableFile name can be a complete path or just the file name. If a webViewableFile value is specified, then the conversion process is not performed. If a file name only is specified, the location of the file is determined as follows: If the SetFileDir optional parameter has been set in this file record or any previous file record, the directory specified in SetFileDir will be used. If the SetFileDir parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader window.) |
webViewableFile:path |
Specifies the location of the web viewable file. If a webViewableFile.path value is specified, the value overrides the value specified for the webViewableFile parameter. However, the webViewableFile:path value is not used to determine the file conversion format. If a webViewableFile:path value is not specified, the location is determined from the webViewableFile parameter, if a value is specified. Otherwise, by default, the primaryFile value is used for the computation. This parameter uses the following syntax: webViewableFile:path=complete_path |
webViewableFile:format |
Specifies the file format to use for the web viewable file. This file format overrides the one specified by the file extension of the file and the value specified for the webViewableFile parameter. If a webViewableFile:format value is not specified, the file format is determined from the file extension for the webViewableFile parameter, if a value is specified. Otherwise, by default, the primaryFile value is used for the computation. This parameter uses the following syntax: alternateFile:format=application/conversion_type |
primaryOverrideFormat |
Specifies which file format to use for the Primary File. This file format overrides the one specified by the file extension of the file. This option will only work as a parameter if you enable the IsOverrideFormat configuration variable. You can set this variable by selecting the Allow Override Format in the System Properties application. However, a better (and recommended) alternative would be to use the primaryFile:format parameter. |
alternateOverrideFormat |
Specifies which file format to use for the Alternate File. This file format overrides the one specified by the file extension of the file. This option will only work as a parameter if you enable the IsOverrideFormat configuration variable. You can set this variable by selecting the Allow Override Format in the System Properties application. However, a better (and recommended) alternative would be to use the alternate File:format parameter. |
SetFileDir |
Specifies the directory where the Primary Files and Alternate Files are located. This field is carried over to the next file record. |
Any custom metadata field that has been defined in the Configuration Manager can be included in a file record.
If you have defined any custom metadata fields as required fields, those fields must be defined for an insert action or an update action.
If a custom metadata field is not a required field, but it has a default value (even if blank), then the default value will be used if the value is not specified in the batch load file.
When specifying a custom metadata field value, the field name preceded with an x. For example, if you have a custom metadata field called Location, then the batch load file entry will be xLocation=value.
Keep in mind that some add-on products use custom metadata fields. For example, if you have PDF Watermark, you will have created a field called Watermark. To include this field in a batch load file, precede it with an x just like any other custom metadata field (for example, xWatermark).
This section covers these topics:
Section 5.2.3, "Creating a Batch Load File from the BatchBuilder Window"
Section 5.2.5, "Creating a Batch Load File from the Command Line"
You can use any method you prefer to create a batch load file, if the resulting text file conforms to the batch load file syntax requirements. However, the Batch Loader provides a tool called the BatchBuilder to assist you in creating batch load files.
The BatchBuilder creates a batch load file based on the files in a specified directory. The BatchBuilder reads recursively through all the sub-directories to create the batch load file.
A mapping file tells the BatchBuilder how to determine the metadata for each file record. You can use the BatchBuilder to create and save custom Mapping Files.
You can run the BatchBuilder from the standalone application interface or from the command line.
The BatchBuilder can also be used to create external collections of content, which are indexed and stored in a separate search collection rather than in the Content Server database. You can set up read-only external collections, where users can search for content but cannot update metadata or delete content. This option is recommended when external content is also included in another Content Server instance.
If you plan to use the Batch Loader utility to update and insert a large number of files on your Content Server instance simultaneously, you must create a batch load file. Two of the optional parameters that you can include in your batch load file are primaryOverrideFormat and alternateOverrideFormat. However, these options only work as parameters in the batch load file if you enable the IsOverrideFormat configuration variable. You can set this variable using the System Properties application.
Mapping files are text files that have a .hda extension, which identifies them as a type of data file used by the Content Server instance.
For more information on HDA files, LocalData properties, and ResultSets, see Oracle Fusion Middleware Developing with Oracle WebCenter Content.
The metadata mapping can be defined in one of two formats:
As name/value pairs in a LocalData definition, a mapping file would look like the following:
@Properties LocalData dDocName=<$filename$>.<$extension$> dInDate=<$filetimestamp$> @end
As a BatchBuilderMapping ResultSet, a mapping file would look like the following:
@ResultSet SpiderMapping 2 mapField mapValue dDocName <$filename$>.<$extension$> dInDate <$filetimestamp$> @end
The following values can be used in a mapping file:
Value | Description | Example |
---|---|---|
Normal string |
All files will have the specified metadata value. |
All files will be the Document content type. |
Idoc script |
Any supported Idoc script. For more information, see Oracle Fusion Middleware Developing with Oracle WebCenter Content |
|
|
The directory name at the specified level in the file's path. |
dDocType=<$dir1$> dSecurityGroup=<$dir2$> dDocAccount=<$dir3$> If the file path is "f:/docs/public/sales/march.doc" and you have specified the "Directory" value as "f:/docs", the values would be: <$dir1$> = "docs" <$dir2$> = "public" <$dir3$> = "sales" |
|
The user currently logged in. |
dDocAuthor=<$dUser$> If "administrator" is logged in, then |
|
The file extension of the file. |
dDocTitle=<$filename$>.<$extension$> If the file path is "d:/salesdocs/sample.doc", then |
|
The name of the file. |
dDocName=<$filename$> If the file path is "d:/salesdocs/sample.doc", then |
|
The entire directory path of the file, including the file name. |
xPath=<$filepath$> If the file path is "c:/docs/public/acct/sample.doc", then |
|
The size of the file (in bytes). |
xFileSize=<$filesize$> For a 42KB file, |
|
The date and time the file was last modified. |
dInDate=<$filetimestamp$> If the last modified date is September 13, 2001 at 4:03 pm, then |
|
The URL of the file, based on the values of the physical file root and relative web root. |
To create a batch load file from the BatchBuilder window:
Start the Batch Loader utility:
Windows: Choose Start, then Programs, then Content Server, then instance_name, then Utilities, then BatchLoader.
UNIX: Go to the DomainHome
/ucm/cs/bin/
directory, type ./BatchLoader
in a shell window, and press the RETURN key on your keyboard.
In the login window, enter the Content Server administrator user name and password, then click OK.
In the Batch Loader window, choose Options, then Build Batch File.
In the Directory field on the BatchBuilder window, enter the location of the files to be included in the batch load file.
In the Batch Load File field, enter the path and file name for the batch load file. You can click the Browse button to navigate to and select the directory and file.
From the Mapping list, select a mapping file. To create a new mapping file or edit an existing one, see Section 5.2.4.
Optional: In the File Filter field, enter filter settings to include or exclude particular files from the batch load file.
Optional: To batch load a read-only external collection, choose External, and select the external collection options.
Click Build.
When the build process is complete, click OK.
Open the batch load file in a text editor and double-check the file records.
To save the current batch load file settings as the default, choose Options, then Save Configuration.
To create a mapping file.
Display the BatchBuilder window.
Click Edit next to the Mapping field.
In the BatchBuilder Mapping List window, click Add.
In the Add BatchBuilder Mapping window, enter a name and description for the mapping file, and click OK.
In the Edit BatchBuider Mapping window, click Add.
In the Add/Edit BatchBuilder Mapping Field window, enter a metadata field name to be defined. For example, enter dDocName for the Content ID field, or xComments for the Comments field.
Enter the value for the metadata field.
Type any constant text and Idoc script directly in the Value field. For example, to set Document as the Type for all documents in the batch load file, enter dDocType in the Field field, and enter Document in the Value field. For more information, see Oracle Fusion Middleware Developing with Oracle WebCenter Content.
To add a predefined variable to the Value field, select the variable in the right column and click the << button. For example, to set each document's second-level directory as the Security Group, enter dSecurityGroup in the Field field, and insert the <$dir1$> variable in the Value field.
Caution:
Be careful when choosing predefined variables. Many metadata fields have length limitations and cannot contain certain characters (such as spaces or punctuation marks). See "Managing Content" in the Oracle Fusion Middleware Managing Oracle WebCenter Content.
Repeat steps 4 through 8
for as many metadata fields as you want to define.
Click OK to save changes and close the Edit BatchBuilder Mapping window.
The mapping file is saved as MapFileName.hda
in the IntradocDir
/search/external/mapping/
directory.
Click Close to close the BatchBuilder Mapping List window.
You can create a batch load file by entering the BatchBuilder parameters from a command line rather than entering them in the BatchBuilder window. To create a batch load file from the command line:
Open the DomainHome
/ucm/cs/bin/intradoc.cf
g file in a text editor, and add the following line, where sysadmin is the user name of the Content Server system administrator:
BatchLoaderUserName=sysadmin
This is required so that the system logs in as the system administrator, because only users who have admin rights have permission to run the Batch Loader and BatchBuilder applications.
Save and close the file.
Open a command line window and change to the DomainHome
/ucm/cs/bin/
directory.
Caution:
Run the BatchBuilder using the same operating system account that runs the Content Server instance. Otherwise, the software might not process your data due to permissions problems.
Enter the following command:
Windows:
BatchLoader.exe -spider -q -ddirectory -mmappingfile -nbatchloadfile
UNIX:
BatchLoader -spider -q -ddirectory -mmappingfile -nbatchloadfile
The following flags can be used with the BatchLoader command to run the BatchBuilder from the command line:
Flag | Required? | Description |
---|---|---|
-spider or /spider |
Yes |
Runs the BatchBuilder application. |
-q or /q |
No |
Runs the BatchBuilder in quiet mode in the background. (If the BatchBuilder is run from the command line without this flag, the BatchBuilder window will be displayed.) |
-d or /d |
Yes |
Directory field value. |
-m or /m |
Yes |
Mapping field value. |
-n or /n |
Yes |
Batch Load File field value. |
-e or /e |
No |
Exclude specified files (Exclude check box selected). |
-i or /i |
No |
Include specified files (Exclude check box deselected). |
The following example shows the correct syntax to run the BatchBuilder from a Windows command line, where:
Directory = c:/myfiles
Mapping File = MyMappingFile
Batch Load File = c:/batching/batchinsert.txt
Excluded files = *.exe
and *.zip
BatchLoader.exe -spider -q -dc:/myfiles -mMyMappingFile -nc:/batching/batchinsert.txt -eexe,zip
The following example shows the correct syntax to run the BatchBuilder from a UNIX command line, where:
Directory = /myfiles
Mapping File = MyMappingFile
Batch Load File = /batching/batchinsert.txt
Excluded files = index.htm
and index.html
BatchLoader -spider -q -d/myfiles -mMyMappingFile -n/batching/batchinsert.txt -eindex.htm,index.html
This section covers these topics:
The Batch Loader uses the information from a batch load file to check in (insert), delete, or update a large number of files on your Content Server instance simultaneously.
You can run the Batch Loader from the standalone application interface or from the command line.
After you run the Batch Loader, the Content Server instance processes files through the Inbound Refinery and the Indexer as it would for any other content item.
To batch load content using the Batch Loader window:
Display the Batch Loader window.
Click Browse, navigate to and select the batch load file.
To change the number of errors that can occur before the Batch Loader stops processing, enter the number in the Maximum errors allowed field.
To delete files from the hard drive after they are successfully checked in or updated, select Clean up files after successful check in.
To create a text file containing the file records that failed during batch loading, select Enable error file for failed revision classes.
Click Load Batch File to start the Batch Loader process.
When the batch load process is complete, a Batch Loader message window is displayed, indicating the number of errors that occurred, if any.
If you enabled the error file, write down the file name shown in the message box.
Click OK.
Correct any problems with the batch load.
To save the current Batch Loader settings as the default, choose Options, then Save Configuration.
You can batch load content by entering the Batch Loader parameters from a command line rather than entering them in the Batch Loader window. To run the Batch Loader from the command line:
Open the DomainHome
/ucm/cs/bin/intradoc.cfg
file in a text editor, and add the following line, where sysadmin is the user name of the Content Server system administrator:
BatchLoaderUserName=sysadmin
This is required so that the system logs in as the system administrator, because only users who have admin rights have permission to run the Batch Loader application.
Save and close the file.
Open a command line window and go to the DomainHome
/ucm/cs/bin/
directory.
Caution:
Run the Batch Loader using the same operating system account that runs the Content Server instance. Otherwise, the software might not process your files due to permissions problems.
Enter the following command:
Windows:
BatchLoader.exe -q -nbatchloadfile
UNIX:
BatchLoader -q -nbatchloadfile
The Batch Loader processes the batch load file, but message boxes will not be shown.
Correct any problems with the batch load.
The following flags can be used with the BatchLoader command from the command line:
Flag | Required? | Description |
---|---|---|
-q or /q |
No |
Runs the Batch Loader in quiet mode in the background. (If the Batch Loader is run from the command line without this flag, the Batch Loader window will be displayed.) |
-n or /n |
Yes |
Batch Load File field value. |
-console |
No |
Echoes all output to the HTML Content Server log and to the console window that is running the Batch Loader. For details, see Section 5.3.6 |
The following example shows the correct syntax to run the Batch Loader from a Windows command line, where the batch load file is c:/batching/batchinsert.txt
:
BatchLoader.exe -q -nc:/batching/batchinsert.txt
Occasionally, you may need to use remote access when managing your Content Server instance. This does not necessarily mean that remote terminal access is required. However, you must have the ability to submit commands to the server from a remote location.
Combining remote access with the IdcCommand utility provides a powerful toolset and an easy way to check in a large number of files to your instance. To take advantage of this functionality, you will need to properly set up the workstation to submit commands and be able to use the IdcCommand utility with a batch load command file.
This section covers the following topics:
A batch load command file contains a set of commands for each file that is loaded. If you are loading a large number of files, the command file may contain hundreds of lines. Using an editing tool can simplify the task of creating the numerous required lines. For example, the procedure for Preparing for Remote Batch Loading shows how you can prepare a batch load command file using the editing and mail merge features of Microsoft Office.
The following is an example Batch Load command file:
@Properties LocalData IdcService=CHECKIN_UNIVERSAL doFileCopy=1 dDocTitle=thisfile dDocType=Native dSecurityGroup=Internal dDocAuthor=sysadmin primaryFile=filename primaryFile:Path=pathtothefile/primaryfilename xComments=Initial Check In @end <<EOD>>@Properties LocalData IdcService=CHECKIN_UNIVERSAL doFileCopy=1 dDocTitle=99.tif dDocType=Native dSecurityGroup=Internal dDocAuthor=sysadmin primaryFile=350.afp primaryFile:path=/lofs/invoices/350.afp xComments=Initial Check In @end <<EOD>>
You can perform batch loading from remote locations. The following procedure is written for a Microsoft Windows operating system and contains these main stages:
Configure the local computer
Test the configuration for the remote workstation
Create a batch load command file
Execute the upload
To configure the local computer:
Open Windows Explorer.
Create a working directory (for example, C:\
working_dir
).
In the working directory, create one or more directories for the Content Server instances you will be accessing (for example, C:\
working_dir
\development
and C:\
working_dir
\contribution
). These directories can be referred to as DomainHomeName
.
In each DonainHomeName
directory, create a cmdfiles
subdirectory.
From the remote Content Server instance, copy the following directories from MW_HOME
\user_projects\domains\
Domain_Name
\ucm\cs
in to their respective DomainHomeName
(in this case C:\
working_dir
\development
and C:\
working_dir
\contribution
).
working_dir
\
DomainHomeName
\ucm\cs\bin
working_dir
\
DomainHomeName
\ucm\cs\config
From the remote Content Server instance, copy the following directories (and their files) to your working directory:
working_dir
\idc\bin
working_dir
\idc\components
(copying the CSDms
and NativeOsUtils
component files should be sufficient)
working_dir
\idc\config
working_dir
\idc\jlib
working_dir
\idc\resources\core\lang
working_dir
\idc\resources\core\table
working_dir
\idc\resources\core\config
Using a text editor, open the DomainHomeName
\ucm\cs\bin\intradoc.cfg
file on your local system and update the IntradocDir configuration variable to match your directory structure. For example:
IntradocDir=working_dir\DomainHomeName\ucm\cs, IdcHomeDir=working_dir\idc WeblayoutDir=working_dir\DomainHomeName\ucm\cs\weblayout
Using a text editor, open the working_dir
\
DomainHomeName
\ucm\cs\config\config.cfg
file on your local system and verify the following settings are correct.
IntradocServerPort=4444
IntradocServerHostName=HostMachineName
In the remote Content Server instance, add the IP address of the local computer to the Security Filter, using the Systems Properties utility.
Restart the remote Content Server instance.
To test the configuration for the remote workstation:
In the cmdfiles
directory, create a file named pingservertest.hda
and add the following lines:
@Properties LocalData IdcService=PING_SERVER @end
Open a command prompt and change to your working bin directory (for example, cd C:\
working_dir
\development\bin
)
Issue the following command:
IdcCommand -f ..\cmdfiles\pingservertest.hda -u sysadmin -l ..\pingservertest.log -c server
Confirm the output. If you are successful, you will get the following message from the server.
3/24/04: Success executing service PING_SERVER. You have completed your setup for remote commands.
To create a batch load command file:
This procedure uses the editing and mailmerge features of Microsoft Office to create a batch load command file.
Create a file listing of your directory contents:
Open a command prompt and change to the root directory representing the files you intend to load.
Create a file listing, using the following command to redirect the output in to a file:
dir /s /b > filelisting.txt
Check your filelisting.txt
file; it will look something like this:
V:\policies\ADMIN\working_dir_Admin\AbbreviationList.doc V:\policies\ADMIN\working_dir_Admin\Abbreviations.doc V:\policies\ADMIN\working_dir_Admin\AbsencePres.doc V:\policies\ADMIN\working_dir_Admin\AdmPatientCare.doc V:\policies\ADMIN\working_dir_Admin\AdmRounds.doc V:\policies\ADMIN\working_dir_Admin\AdverseEvents.doc V:\policies\ADMIN\working_dir_Admin\ArchivesPermanent.doc V:\policies\ADMIN\working_dir_Admin\ArchivesRetrieval.doc V:\policies\ADMIN\working_dir_Admin\ArchivesStandardReq.doc
Note:
When working with batch loads, it is important to note that the file must exist on the server indicated by the primaryFile statement in the batch load command file. Optimally, you should use the same letter to map the directory of files to the server and to your local system. Alternatively, you can copy the directory of files to the server temporarily.
Edit the file listing to create your file name and title data:
Open your filelisting.txt
file in Excel.
Using Replace, remove all the directory information leaving only the file name. Also look for and remove the line for filelisting.txt
.
Copy column A (containing the file names) to column B. In this example the file name is also used for the title and Column B will become the title.
Using Replace, remove the file extension from the names in column B.
Insert a new first line and enter filename in the first column and title in the second.
Save the file.
Create a .hda
file from the file listing using Mail Merge features:
Open the Word application and create a new document with your set of batch load commands. The following example shows basic batch load commands. You must match your configuration settings when you create your batch load commands.
@Properties LocalData
IdcService=CHECKIN_UNIVERSAL
doFileCopy=1
dDocTitle=
dDocType=Native
dSecurityGroup=Internal
dDocAccount=Policy/Admin
dDocAuthor=sysadmin
primaryFile=d:/temp/working_dir_Admin/
xComments=Initial Check In
@end
<<EOD>>
Select Tools / Letters and Mailing / Mail Merge Wizard and advance through the wizard. Choose the selections below to use your filelisting.txt
file as input to the mail merge.
Letter Document (step 1)
Current document (step 2)
Existing List (step 3) and select your Excel spreadsheet as the data source
More Items (step 4), place the title and filename fields in to the word document so that it looks like the following:
@Properties LocalData
IdcService=CHECKIN_UNIVERSAL
doFileCopy=1
dDocTitle="title"
dDocType=Native
dSecurityGroup=Internal
dDocAccount=Policy/Admin
dDocAuthor=sysadmin
primaryFile=d:/temp/working_dir_Admin/"filename"
xHistory=Initial Check In
@end
<<EOD>>
Complete the mail merge (Steps 5 and 6) and you will have a new Word document with one merge record per page.
Edit the letters, selecting all, and use the Replace feature to remove all of the section breaks.
Save the file as a plain text file to the /cmdfiles
directory with the file extension of hda
(for example, filelisting.hda
)
Open a command prompt.
Navigate to the working bin directory.
Issue the command:
IdcCommand -f ../cmdfiles/filelisting.hda -u sysadmin -l ../filelisting.log -c server
Your files will be checked in to the Content Server repository and a message is displayed in the command window as each file is checked in.
Depending on the action you plan to perform using the Batch Loader, certain fields are required in the batch load file. If you are updating only the metadata in existing content items, the primaryFile field is not required in the batch load file; for more information see Section 5.1.5.1.
However, if you want to load (insert action) content in to the Content Server instance as metadata only, then the primaryFile field is required in the batch load file. Although the field is ignored by the import, the Batch Loader expects it to be defined. If the primaryFile field is missing, you will get an error as follows (or similar):
Please check record number <number>. BatchLoader: unable to check in '<record>' because the required field 'primaryFile' is missing.
To batch load content as metadata only:
Open the Content Server instance config.cfg
file:
IntradocDir
/config/config.cfg
Add the following configuration variables:
createPrimaryMetaFile=true AllowPrimaryMetaFile=true
Save and close the config.cfg
file.
In the batch load file, add the following fields for each record:
primaryFile= createPrimaryMetaFile=true
Note that leaving the primaryFile
field blank is acceptable. The field is ignored but must be included.
Continue to batch load your content using the Batch Loader procedure or the command line procedure. For more information, see Section 5.3.2 or Section 5.3.3
.
Adding the -console
switch to the Batch Loader command line causes all output to be echoed to the HTML Content Server log and to the console window that is running the Batch Loader. Alternatively, you can use operating system redirects to send the output to a separate log file.
Important:
The -console
switch does not follow standard Windows command line syntax (although this may be corrected in later versions). You must use the -console
syntax usually associated with UNIX instead of the /console
syntax. With most other command line utilities, both syntaxes will work on both platforms.
Windows command line:
BatchLoader.exe -console -q -nc:/batching/batchinsert.txt
UNIX command line:
BatchLoader -console -q -n/u2/apps/batching/batchinsert.txt
Processed 1 of 4 record. Processed 2 of 4 records. Processed 3 of 4 records. Processed 4 of 4 records. Done processing batch file 'c:/batching/batchinsert.txt'. Out of 4 records processed, 4 succeeded and 0 errors occurred.
You can use a redirect symbol on the command line to send the Batch Loader output to a separate log file. The symbol works on both UNIX and Windows. By default, the -console
switch sends the Batch Loader's output to stderr. To redirect the output to a different file, use the special redirect symbol 2>
.
In the following examples, each command must be entered all on one line.
Windows command line with redirect:
BatchLoader.exe -console -q -nc:/batching/batchinsert.txt 2> batchlog.txt
UNIX command line with redirect:
BatchLoader -console -q -n/u2/apps/batching/batchinsert.txt 2> /logs/CSbatchload.log
To correct any errors that occur during batch loading.
Choose Administration, then Log Files, then Content Server Logs.
In the Content Server log file, look through the Type column for the word Error
.
Read the description to determine the problem.
Fix the error in one of these files:
Batch load file
The error file for the failed content. (This option is available only if you enabled it on the Batch Loader window.) The error file is located in the same directory as the batch load file, with several digits appended to the batch load file name.
Tip:
If you rerun an entire batch load file, content items that have already been checked in will usually fail. This occurs because the release dates of the existing content items will be the same as the ones you are trying to insert.
This section provides some basic guidelines that you can use to improve Batch Loader performance. These suggestions can minimize potentially slow batch load performance when you are checking in a large number of content items. In many cases, proper tuning for batch loading can significantly speed up a slow server.
To minimize batch loading slow downs, try implementing the following Batch Loader adjustments:
Temporarily disable other activities such as shutting down Inbound Refinery (see Oracle Fusion Middleware Managing Oracle WebCenter Content) and suspending the automatic update cycle feature of the Repository Manager.
Analyze your database usage during a batch load to help the database query optimizer. Databases have built-in optimizer utilities that can help make database queries more efficient. However, to maximize the efficiency of optimizers, it is necessary to update or re-create the statistics about the physical characteristics of a table and the associated indexes. These characteristics include number of records, number of pages, and the average record length. The optimizers use these statistics to access data.
Each database has a proprietary command that you can use to invoke the statistic update or recreation process. For example:
For Oracle, use the ANALYZE TABLE COMPUTE STATISTICS
command
For SQL Server, use the CREATE STATISTICS
statement
For DB2, use the RUNSTATS
command
This case study describes a very slow load batch performance and the steps taken to diagnose and correct the situation. This information can serve as a model for isolating underlying issues and resolving batch loading performance problems.
A user wanted to load 27,000 content items in to the Content Server instance that was running on an AIX server. The DB2 database was running on a separate AIX server. The content items included TIFs as the native files and corresponding PDFs as the web-viewable files. Inbound Refinery generated thumbnails from the native files.
Initially during the batch load, the performance was acceptable with sub-second insert times. However, after a few thousand content items were loaded, the performance began to degrade. Content items started to require a few seconds to load and, eventually, the load time was over 10 seconds per content item.
While the batch load was running, nothing seemed to be wrong with the Content Server instance. It had sufficient memory, the CPU utilization was low (less than 5%), and there were no disk bottlenecks. The Inbound Refinery server was busy, but was processing thumbnails at an acceptable rate.
Two issues were found with the database server:
Two processes were taking turns to update the database. While one process was executing, the second process waited for other process to release database locks. When the first process completed, the second process executed while the first process waited. The processes in this execute/wait cycle included:
The actual batch load process that was updating the database tables after inserting a content item.
The Content Server instance was updating the database tables; changing the status from GENWWW to DONE after receiving notification that a thumbnail had been completed.
The two processes should not have been contending with each other because they were not updating the same content items. It seemed that the two processes were locking each other out because DB2 had performed lock escalation and was now locking entire database pages instead of single rows.
There were a large number of tablespace scans being performed by both processes.
A two-step solution was used:
Inbound Refinery was shut down to prevent the status update process from competing with the batch loading process. The performance did improve because there was a 2000+ backlog of content items from the completed thumbnails.
A RUNSTATS
command was issued on all the Content Server database tables to update the table statistics. This dramatically improved the performance of the batch load. The insert time returned to sub-second and the batch load completed within a short amount of time. It took 21 hours to insert the first 22,000 content items. After updating the table statistics, the remaining 5,000 content items were inserted in 13 minutes.