4 Batch Loading Content
This chapter covers these topics:
4.1 About Batch Loading
Batch loading a number of files can be automated to save time and effort by using the Batch Loader utility. The following are examples of when to use Batch Loader:
-
You just purchased the WebCenter Content software, and you want check in all of your existing files with metadata that exists in a database.
-
You have documents checked in to the Content Server repository, and you just created a new custom metadata field. You can use Batch Loader to add the values you specify for the new metadata field to each existing content item.
-
You want to remove a large number of specific files from the system.
Batch Loader performs actions that are specified in a batch load file, which is a text file that tells Batch Loader which actions to perform and what metadata to assign to each content item in the batch.
Note:
For the Batch Loader utility to function correctly with an Oracle WebLogic Server instance, you must have JDBC connection settings configured. For instructions, see Running Administration Applications in Standalone Mode.
This section covers these topics:
4.1.1 About Batch Load File Records
A batch load file is made up of file records, which are sets of name/value pairs that specify the action to perform, or the metadata for individual content items, or both.
Note:
Field names and parameters are case sensitive. They must appear in the batch load file exactly as they appear in the following sections. For example, dDocName
is not the same as ddocname
, dDocname
, or DDOCNAME
.
-
Each file record ends with an
<<EOD>>
(end of data) marker. -
A pound sign (#) followed by a space at the beginning of a line indicates a comment. The comment character must be followed by a space. For example:
# primaryFile=test.txt
works properly, but#primaryFile=test.txt
will cause errors. -
The following is an example of a file record:
# This is a comment Action=insert dDocName=Sample1 dDocType=Document dDocTitle=Batch Load record insert example dDocAuthor=sysadmin dSecurityGroup=Public primaryFile=links.doc dInDate=8/15/2001 <<EOD>>
4.1.2 About Batch Load Actions
Valid actions for batch loading are Insert, Delete, and Update.
-
If no action is specified for a file, the system tries to perform an update.
-
Each file record can have only one action, but file records with different actions can be present in the same batch load file.
-
The logic process for each action is different.
4.1.3 About Batch Load Insert Action
The Insert action checks a new file in to the Content Server repository. Figure 4-1 illustrates the insert action.
-
If the Content ID (dDocName) does not exist in the Content Server database, then a new file is created.
-
If the Content ID (dDocName) exists in the Content Server database, and no revision (dRevLabel) is specified, then a new revision is created.
-
If the Content ID (dDocName) and the specified revision (dRevLabel) exist in the Content Server database, then no action is performed.
Figure 4-1 The Insert Action Sequence for Checking In a New File

Description of "Figure 4-1 The Insert Action Sequence for Checking In a New File"
4.1.3.1 Insert Requirements
The following table defines the fields required for successful performance of an insert action.
Note:
Batch loaded revisions will not enter a workflow even if they meet the criteria for an active workflow.
-
Field Length: Maximum number of characters permitted in the field.
-
Carried Over: If the next record does not contain this field, the value of this field will be taken from the previous record.
Important:
If you have defined any custom metadata fields as required fields, those fields also need to be defined for an insert action.
Required Items Field Length Carried Over Definition Action=insert
N/A
Yes
The command to insert a file.
The term Action is case sensitive and must be initial capitalized.
dDocName
30
No
The metadata field named Content ID.
dDocType
30
Yes
The metadata field named Type.
dDocTitle
80
No
The metadata field named Title.
dDocAuthor
30
Yes
The metadata field named Author.
dSecurityGroup
30
Yes
The metadata field named Security Group.
primaryFile
N/A
N/A
The metadata field named Primary File. The Primary File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:
-
If the
SetFileDir
optional parameter has been set in this file record or any previous file record, the directory specified inSetFileDir
will be used. -
If the
SetFileDir
parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader window.)
By default, the length of the Primary File name cannot exceed 80 characters (of which the extension can only be 8 characters maximum).
dInDate
N/A
No
The metadata field named Release Date.
-
The dInDate must use the date format of the locale of the user executing the Batch Loader. For example, the US English date format is mm/dd/yy hh:mm:ss am/pm.
-
Time information is optional. If you specify the time, only the hh:mm part is required. The ss and am/pm parts are optional.
<<EOD>>
N/A
N/A
Indicates the end of data for the file record.
-
4.1.3.2 Insert Example
The following code fragments show the batch load file syntax for inserting files. This example shows two file records.
The first file record includes all required fields and the action statement, Action=insert
. The second file record does not list the required fields: dDocType, dDocAuthor, or dSecurityGroup. However, the information for these items is taken from the previous record. Also, the second record does not specify an action, so the insert action is carried over. Therefore, if the Content ID HR003
does not exist, the file will be inserted. However, if the Content ID does exist, it will not be inserted because the action is insert and not update.
-
First record:
Action=insert dDocName=HR001 dDocType=Form dDocTitle=New Employee Information Form dDocAuthor=Olson dSecurityGroup=Public primaryFile=hr001.doc dIndate=3/15/97 <<EOD>>
-
Second record:
dDocName=HR003 dDocTitle=Performance Review primaryFile=hr003.doc dIndate=3/15/97 <<EOD>>
4.1.4 About Batch Load Delete Action
The delete action deletes one or all revisions of an existing file from the Content Server repository. If the specified Content ID (dDocName) does not exist in the Content Server database, no action is performed. Figure 4-2 illustrates the delete action.
4.1.4.1 Delete Requirements
The following table defines the fields required for successful performance of a delete action.
Required Items | Definition |
---|---|
Action=delete |
The command to delete a file. The term Action is case sensitive and must be initial capitalized. |
dDocName |
The metadata field named Content ID. |
<<EOD>> |
Indicates the end of data for the file record. |
4.1.4.2 Delete Example
The following example shows the batch load file syntax for deleting files. This example shows two file records. The first file record will delete all revisions of the Content ID HR001
. The second file record will delete revision 2 of the content item HR002
.
Action=delete dDocName=HR001 <<EOD>> Action=delete dDocName=HR002 dRevLabel=2 <<EOD>>
4.1.5 About Batch Load Update Action
The update action updates existing content items. One of the following actions occurs, depending on what items are present in the file record and what content exists in the system:
-
A new revision of an existing content item is created.
-
An existing file's metadata is updated.
-
A new content item is inserted (
Action=insert
is performed).Note:
Batch loaded revisions will not enter a workflow even if they meet the criteria for an active workflow.
A new revision is created when one of the following scenarios occur:
Scenario | Content ID (dDocName) | Revision (dRevLabel) | Release Date in Batch Load file (dInDate) |
---|---|---|---|
Scenario 1 |
Exists in Content Server instance |
Not specified in the batch load file. |
After the release date of the latest revision of the file in the system. |
Scenario 2 |
Exists in Content Server instance |
Specified in the batch load file, but does not exist in Content Server instance. |
After the release date of the latest revision of the file in the system. |
4.1.5.1 Update Requirements
The following table defines the fields required for successful performance of an update action.
Required Items | Field Length | Carried Over | Definition |
---|---|---|---|
Action=update |
N/A |
Yes |
The command to update a file. The term |
dDocName |
30 |
No |
The metadata field named Content ID. |
dDocType |
30 |
Yes |
The metadata field named Type. |
dDocTitle |
80 |
No |
The metadata field named Title. |
dDocAuthor |
30 |
Yes |
The metadata field named Author. |
dSecurityGroup |
30 |
Yes |
The metadata field named Security Group. |
primaryFile |
N/A |
N/A |
The metadata field named Primary File. If only the metadata is being updated, the primaryFile field is not required but dRevLabel is required. If the optional dRevLabel field is specified and matches a revision label that exists in the Content Server instance, the primaryFile field is not required; the primary file specified for that revision is used. It is important to note that although dRevLabel is not a required field, if the primaryFile is not present, then dRevLabel becomes a required field. The Primary File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:
|
dInDate |
N/A |
No |
The metadata field named Release Date.
|
<<EOD>> |
N/A |
N/A |
Indicates the end of data for the file record. |
4.1.5.2 Update Example 1
This example assumes that two files are already checked in to the system with the following metadata:
-
HR001 has a Release Date of 9/26/98 and Revision of 1
-
HR002 has a Release Date of 3/15/99 and Revision of 2
The first file record, Content ID HR001, exists in the system, but it does not have a Revision (dRevLabel) specified in the batch load file. Therefore, the Batch Loader will compare the Release Date of the latest revision in the system with the Release Date specified in the batch load file. Since 2/20/99 is after 9/26/98, a new revision 2 for HR001 is added.
The second file record, Content ID HR002, exists in the system and has a Revision (dRevLabel) specified, but Revision 3 does not exist in the system. Therefore, a new revision 3 for HR002 is added.
Action=update dDocName=HR001 dDocType=Form dDocTitle=New Employee Form dDocAuthor=Olson dSecurityGroup=Public primaryFile=hr001.doc DInDate=2/20/99 <<EOD>> dDocName=HR002 dDocTitle=Payroll Change Form primaryFile=hr002.doc DIndate=2/20/99 dRevLabel=3 <<EOD>>
4.1.5.3 Update Example 2
This example assumes that one file is already checked in to the system with the following metadata:
-
Content ID = HR003
-
Release Date = 3/15/97
-
Revision = 1
-
Title = Performance Review
-
Author = Smith
Because Revision 1 of the Content ID HR003 exists in the system (and is not in an active workflow), the revision will be updated with the new Title, Author, and Release Date metadata.
Action=update dDocName=HR003 dDocType=Form dDocTitle=Performance Review Template dDocAuthor=Smith primaryFile=hr003.doc dIndate=2/20/99 dRevLabel=1 <<EOD>>
4.1.6 About Optional Batch Load File Parameters
The following table lists the optional parameters you can use in any file record in a batch load file.
In a batch load file, there are two methods you can use to override the primary and alternate formats assigned to a content item check-in:
-
Specifying a value for the
primaryFile:format
parameter, or specifying a value for thealternateFile:format parameter
, both. However, it is possible to override these values by using theprimaryOverrideFormat
oralternateOverrideFormat
parameters. It is also possible that certain components will force specific formats on certain types of check-ins or certain application functionality may exist in some components that forces a different format. -
Specifying a value for the
primaryOverrideFormat
parameter, or specifying a value for thealternateOverrideFormat
parameter, or both. However, these will only work as parameters in the batch load file if you enable theIsOverrideFormat
configuration variable. Note that using this method will override any values that you set for theprimaryFile:format
andalternateFile:format
parameters.Optional Parameters Definition dRevLabel
The metadata field named Revision.
Maximum field length is 10 characters.
Values must be an integer or comply with the Major/Minor Revision Label Sequence established by the System Properties settings.
dDocAccount
The metadata field named Accounts.
Maximum field length is 30 characters.
This field is not carried over to the next file record.
Do not specify this field if accounts are not enabled.
If accounts are enabled and this field is not specified, dDocAccount will be set to an empty value.
xComments
The metadata field named Comments. Maximum field length is 255 characters.
dOutDate
The metadata field named Expiration Date.
The
dOutDate
must use the date format of the locale of the user executing the Batch Loader. For example, the English-US date format is mm/dd/yy hh:mm:ss am/pm.Time information is optional. If you specify the time, only the
hh:mm
part is required. The ss andam/pm
parts are optional.primaryFile:path
Specifies the location of the file. If a
primaryFile:path
value is specified, the value overrides the value specified for theprimaryFile
parameter. However, theprimaryFile:path
value is not used to determine the file conversion format. If a value forprimaryFile:path
is not specified, the location is determined from theprimaryFile
value.This parameter uses the following syntax:
primaryFile:path=c
omplete_path/primary_file_name
primaryFile:format
Specifies the file format to use for the Primary File. This file format overrides the one specified by the file extension of the file and the value specified for the
primaryFile
parameter. If aprimaryFile:format
value is not specified, the file format is determined from the file extension for theprimaryFile
value.This parameter uses the following syntax:
primaryFile:format=
application/conversion_type
alternateFile
The metadata field named Alternate File. The Alternate File name can be a complete path or just the file name. If a file name only is specified, the location of the file is determined as follows:
If the
SetFileDir
optional parameter has been set in this file record or any previous file record, the directory specified inSetFileDir
will be used.If the
SetFileDir
parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader window.)alternateFile:path
Specifies the location of the alternate file. If an
alternateFile:path
value is specified, the value overrides the value specified for thealternateFile
parameter. However, thealternateFile:path
value is not used to determine the file conversion format. If analternateFile:path
value is not specified, the location is determined from thealternateFile
parameter, if a value is specified. Otherwise, by default, theprimaryFile
value is used for the computation.This parameter uses the following syntax:
alternateFile:path=
complete_path
alternateFile:format
Specifies the file format to use for the Alternate File. This file format overrides the one specified by the file extension of the file and the value specified for the
alternateFile
parameter. If analternateFile:format
value is not specified, the file format is determined from the file extension for thealternateFile
parameter, if a value is specified. Otherwise, by default, theprimaryFile
value is used for the computation.This parameter uses the following syntax:
alternateFile:format=
application/conversion_type
webViewableFile
The webViewableFile name can be a complete path or just the file name. If a
webViewableFile
value is specified, then the conversion process is not performed. If a file name only is specified, the location of the file is determined as follows:If the
SetFileDir
optional parameter has been set in this file record or any previous file record, the directory specified inSetFileDir
will be used.If the
SetFileDir
parameter has not been set, the batch load file path is used. (The path is specified in the Batch Load File field on the Batch Loader window.)webViewableFile:path
Specifies the location of the web viewable file. If a
webViewableFile.path
value is specified, the value overrides the value specified for thewebViewableFile
parameter. However, thewebViewableFile:path
value is not used to determine the file conversion format. If a value forwebViewableFile:path
is not specified, the location is determined from thewebViewableFile
value.This parameter uses the following syntax:
webViewableFile:path=
complete_path
webViewableFile:format
Specifies the file format to use for the web viewable file. This file format overrides the one specified by the file extension of the file and the value specified for the
webViewableFile
parameter. ThewebViewableFile:format
value should be explicitly specified, it is not determined from thewebViewableFile
value.This parameter uses the following syntax:
alternateFile:format=
application/conversion_type
primaryOverrideFormat
Specifies which file format to use for the Primary File. This file format overrides the one specified by the file extension of the file. This option will only work as a parameter if you enable the
IsOverrideFormat
configuration variable. You can set this variable by selecting Allow Override Format in the System Properties utility. However, a better (and recommended) alternative would be to use theprimaryFile:format
parameter.alternateOverrideFormat
Specifies which file format to use for the Alternate File. This file format overrides the one specified by the file extension of the file. This option will only work as a parameter if you enable the
IsOverrideFormat
configuration variable. You can set this variable by selecting Allow Override Format in the System Properties utility. However, a better (and recommended) alternative would be to use the alternateFile:format
parameter.SetFileDir
Specifies the directory where the Primary Files and Alternate Files are located. This field is carried over to the next file record.
4.1.7 About Custom Metadata Fields
Any custom metadata field that has been defined in the Configuration Manager can be included in a file record.
-
If you have defined any custom metadata fields as required fields, those fields must be defined for an insert action or an update action.
-
If a custom metadata field is not a required field, but it has a default value (even if blank), then the default value will be used if the value is not specified in the batch load file.
-
When specifying a custom metadata field value, the field name preceded with an x. For example, if you have a custom metadata field called
Location
, then the batch load file entry will bexLocation=
value. -
Keep in mind that some add-on products use custom metadata fields. For example, if you have PDF Watermark, you will have created a field called
Watermark
. To include this field in a batch load file, precede it with an x just like any other custom metadata field (for example,xWatermark
).
4.2 Preparing a Batch Load File
This section covers these topics:
4.2.1 About Preparing a Batch Load File
You can use any method you prefer to create a batch load file, if the resulting text file conforms to the batch load file syntax requirements. However, the Batch Loader provides a tool called the BatchBuilder to assist you in creating batch load files.
-
The BatchBuilder creates a batch load file based on the files in a specified directory. The BatchBuilder reads recursively through all the sub-directories to create the batch load file.
-
A mapping file tells the BatchBuilder how to determine the metadata for each file record. You can use the BatchBuilder to create and save custom Mapping Files.
-
You can run the BatchBuilder from the standalone utility interface or from the command line.
-
The BatchBuilder can also be used to create external collections of content, which are indexed and stored in a separate search collection rather than in the Content Server database. You can set up read-only external collections, where users can search for content but cannot update metadata or delete content. This option is recommended when external content is also included in another Content Server instance.
If you plan to use the Batch Loader utility to update and insert a large number of files on your Content Server instance simultaneously, you must create a batch load file. Two of the optional parameters that you can include in your batch load file are primaryOverrideFormat and alternateOverrideFormat. However, these options only work as parameters in the batch load file if you enable the IsOverrideFormat configuration variable. You can set this variable using the System Properties utility.
4.2.2 Mapping Files
Mapping files are text files that have an .hda
extension, which identifies them as a type of data file used by the Content Server instance.
For more information on HDA files, LocalData properties, and ResultSets, see Elements in HDA Files in Developing with Oracle WebCenter Content.
4.2.2.1 Mapping File Formats
The metadata mapping can be defined in one of two formats:
-
As name/value pairs in a LocalData definition, a mapping file would look like the following:
@Properties LocalData dDocName=<$filename$>.<$extension$> dInDate=<$filetimestamp$> @end
-
As a BatchBuilderMapping ResultSet, a mapping file would look like the following:
@ResultSet SpiderMapping 2 mapField mapValue dDocName <$filename$>.<$extension$> dInDate <$filetimestamp$> @end
4.2.2.2 Mapping File Values
The following values can be used in a mapping file:
Value | Description | Example |
---|---|---|
Normal string |
All files will have the specified metadata value. |
All files will be the Document content type. |
Idoc script |
Any supported Idoc script. See Introduction to the Idoc Script Custom Scripting Language in Developing with Oracle WebCenter Content |
|
|
The directory name at the specified level in the file's path. |
dDocType=<$dir1$> dSecurityGroup=<$dir2$> dDocAccount=<$dir3$> If the file path is
|
|
The user currently logged in. |
dDocAuthor=<$dUser$> If |
|
The file extension of the file. |
dDocTitle=<$filename$>.<$extension$> If the file path is |
|
The name of the file. |
dDocName=<$filename$> If the file path is |
|
The entire directory path of the file, including the file name. |
If the file path is |
|
The size of the file (in bytes). |
xFileSize=<$filesize$> For a 42KB file, |
|
The date and time the file was last modified. |
dInDate=<$filetimestamp$> If the last modified date is September 13, 2001 at 4:03 pm, then |
|
The URL of the file, based on the values of the physical file root and relative web root. |
4.2.3 Creating a Batch Load File from the BatchBuilder Window
To create a batch load file from the BatchBuilder window:
4.2.5 Creating a Batch Load File from the Command Line
You can create a batch load file by entering the BatchBuilder parameters from a command line rather than entering them in the BatchBuilder window. To create a batch load file from the command line:
The following flags can be used with the BatchLoader command to run the BatchBuilder from the command line:
Flag | Required? | Description |
---|---|---|
-spider or /spider |
Yes |
Runs the BatchBuilder utility. |
-q or /q |
No |
Runs the BatchBuilder in quiet mode in the background. (If the BatchBuilder is run from the command line without this flag, the BatchBuilder window will appear.) |
-d or /d |
Yes |
Directory field value. |
-m or /m |
Yes |
Mapping field value. |
-n or /n |
Yes |
Batch Load File field value. |
-e or /e |
No |
Exclude specified files (Exclude check box selected). |
-i or /i |
No |
Include specified files (Exclude check box deselected). |
4.2.5.1 Windows Example
The following example shows the correct syntax to run the BatchBuilder from a Windows command line, where:
-
Directory =
c:/myfiles
-
Mapping File =
MyMappingFile
-
Batch Load File =
c:/batching/batchinsert.txt
-
Excluded files =
*.exe
and*.zip
BatchLoader.exe -spider -q -dc:/myfiles -mMyMappingFile -nc:/batching/batchinsert.txt -eexe,zip
4.2.5.2 UNIX Example
The following example shows the correct syntax to run the BatchBuilder from a UNIX command line, where:
-
Directory =
/myfiles
-
Mapping File =
MyMappingFile
-
Batch Load File =
/batching/batchinsert.txt
-
Excluded files =
index.htm
andindex.html
BatchLoader -spider -q -d/myfiles -mMyMappingFile -n/batching/batchinsert.txt -eindex.htm,index.html
4.3 Running the Batch Loader
This section covers these topics:
4.3.1 About Running the Batch Loader
The Batch Loader uses the information from a batch load file to check in (insert), delete, or update a large number of files on your Content Server instance simultaneously.
-
You can run the Batch Loader from the standalone utility interface or from the command line.
-
After you run the Batch Loader, the Content Server instance processes files through the Inbound Refinery instance and the Indexer as it would for any other content item.
4.3.2 Batch Loading from the Batch Loader Window
To batch load content using the Batch Loader window:
4.3.3 Batch Loading from the Command Line
You can batch load content by entering the Batch Loader parameters from a command line rather than entering them in the Batch Loader window. To run the Batch Loader from the command line:
The following flags can be used with the BatchLoader command from the command line:
Flag | Required? | Description |
---|---|---|
-q or /q |
No |
Runs the Batch Loader in quiet mode in the background. (If the Batch Loader is run from the command line without this flag, the Batch Loader window will appear.) |
-n or /n |
Yes |
Batch Load File field value. |
-console |
No |
Echoes all output to the HTML Content Server log and to the console window that is running the Batch Loader. For details, see Batch Loader -console Command Line Switch. |
4.3.3.1 Windows Example
The following example shows the correct syntax to run the Batch Loader from a Windows command line, where the batch load file is c:/batching/batchinsert.txt
:
BatchLoader.exe -q -nc:/batching/batchinsert.txt
4.3.4 Using the IdcCommand Utility and Remote Access
Occasionally, you may need to use remote access when managing your Content Server instance. This does not necessarily mean that remote terminal access is required. However, you must have the ability to submit commands to the server from a remote location.
Combining remote access with the IdcCommand utility provides a powerful toolset and an easy way to check in a large number of files to your instance. To take advantage of this functionality, you will need to properly set up the workstation to submit commands and be able to use the IdcCommand utility with a batch load command file.
This section covers the following topics:
4.3.4.1 Batch Load Command Files
A batch load command file contains a set of commands for each file that is loaded. If you are loading a large number of files, the command file may contain hundreds of lines. Using an editing tool can simplify the task of creating the numerous required lines. For example, the procedure for Preparing for Remote Batch Loading shows how you can prepare a batch load command file using the editing and mail merge features of Microsoft Office.
The following is an example Batch Load command file:
@Properties LocalData IdcService=CHECKIN_UNIVERSAL doFileCopy=1 dDocTitle=thisfile dDocType=Native dSecurityGroup=Internal dDocAuthor=sysadmin primaryFile=filename primaryFile:Path=pathtothefile/primaryfilename xComments=Initial Check In @end <<EOD>>@Properties LocalData IdcService=CHECKIN_UNIVERSAL doFileCopy=1 dDocTitle=99.tif dDocType=Native dSecurityGroup=Internal dDocAuthor=sysadmin
primaryFile=350.afp primaryFile:path=/lofs/invoices/350.afp xComments=Initial Check In @end <<EOD>>
4.3.4.2 Preparing for Remote Batch Loading
You can perform batch loading from remote locations. The following procedure is written for a Microsoft Windows operating system and contains these main stages:
-
Configure the local computer
-
Test the configuration for the remote workstation
-
Create a batch load command file
-
Execute the upload
4.3.4.2.2 Testing the Configuration for the Remote Workstation
To test the configuration for the remote workstation:
4.3.4.2.3 Creating a Batch Load Command File
This procedure uses the editing and mailmerge features of Microsoft Office to create a batch load command file. To create a batch load command file:
-
Create a file listing of your directory contents:
-
Open a command prompt and change to the root directory representing the files you intend to load.
-
Create a file listing, using the following command to redirect the output in to a file:
dir /s /b > filelisting.txt
-
Check your
filelisting.txt
file; it will look something like this:V:\policies\ADMIN\working_dir_Admin\AbbreviationList.doc V:\policies\ADMIN\working_dir_Admin\Abbreviations.doc V:\policies\ADMIN\working_dir_Admin\AbsencePres.doc V:\policies\ADMIN\working_dir_Admin\AdmPatientCare.doc V:\policies\ADMIN\working_dir_Admin\AdmRounds.doc V:\policies\ADMIN\working_dir_Admin\AdverseEvents.doc V:\policies\ADMIN\working_dir_Admin\ArchivesPermanent.doc V:\policies\ADMIN\working_dir_Admin\ArchivesRetrieval.doc V:\policies\ADMIN\working_dir_Admin\ArchivesStandardReq.doc
Note:
When working with batch loads, it is important to note that the file must exist on the server indicated by the primaryFile statement in the batch load command file. Optimally, you should use the same letter to map the directory of files to the server and to your local system. Alternatively, you can copy the directory of files to the server temporarily.
-
-
Edit the file listing to create your file name and title data:
-
Open your
filelisting.txt
file in Excel. -
Using Replace, remove all the directory information leaving only the file name. Also look for and remove the line for
filelisting.txt
. -
Copy column A (containing the file names) to column B. In this example the file name is also used for the title and Column B will become the title.
-
Using Replace, remove the file extension from the names in column B.
-
Insert a new first line and enter
filename
in the first column andtitle
in the second. -
Save the file.
-
-
Create an
.hda
file from the file listing using Mail Merge features:-
Open the Word application and create a new document with your set of batch load commands. The following example shows basic batch load commands. You must match your configuration settings when you create your batch load commands.
@Properties LocalData IdcService=CHECKIN_UNIVERSAL doFileCopy=1 dDocTitle= dDocType=Native dSecurityGroup=Internal dDocAccount=Policy/Admin dDocAuthor=sysadmin primaryFile=d:/temp/working_dir_Admin/ xComments=Initial Check In @end <<EOD>>
-
Select Tools / Letters and Mailing / Mail Merge Wizard and advance through the wizard. Choose the selections below to use your
filelisting.txt
file as input to the mail merge.-
Letter Document (step 1)
-
Current document (step 2)
-
Existing List (step 3) and select your Excel spreadsheet as the data source
-
More Items (step 4), place the title and filename fields in to the word document so that it looks like the following:
@Properties LocalData IdcService=CHECKIN_UNIVERSAL doFileCopy=1 dDocTitle="title" dDocType=Native dSecurityGroup=Internal dDocAccount=Policy/Admin dDocAuthor=sysadmin primaryFile=d:/temp/working_dir_Admin/"filename" xHistory=Initial Check In @end <<EOD>>
-
-
Complete the mail merge (Steps 5 and 6) and you will have a new Word document with one merge record per page.
-
Edit the letters, selecting all, and use the Replace feature to remove all of the section breaks.
-
Save the file as a plain text file to the
/cmdfiles
directory with the file extension ofhda
(for example,filelisting.hda
)
-
4.3.5 Batch Loading Content as Metadata Only
Depending on the action you plan to perform using the Batch Loader, certain fields are required in the batch load file. If you are updating only the metadata in existing content items, the primaryFile field is not required in the batch load file; for more information see Update Requirements.
However, if you want to load (insert action) content in to the Content Server instance as metadata only, then the primaryFile field is required in the batch load file. Although the field is ignored by the import, the Batch Loader expects it to be defined. If the primaryFile field is missing, you will get an error as follows (or similar):
Please check record number <number>. BatchLoader: unable to check in '<record>' because the required field 'primaryFile' is missing.
To batch load content as metadata only:
4.3.6 Batch Loader -console Command Line Switch
Adding the -console
switch to the Batch Loader command line causes all output to be echoed to the HTML Content Server log and to the console window that is running the Batch Loader. Alternatively, you can use operating system redirects to send the output to a separate log file.
Note:
The -console
switch does not follow standard Windows command line syntax (although this may be corrected in later versions). You must use the -console
syntax usually associated with UNIX instead of the /console
syntax. With most other command line utilities, both syntaxes will work on both platforms.
Command Line Example
-
Windows command line:
BatchLoader.exe -console -q -nc:/batching/batchinsert.txt
-
UNIX command line:
BatchLoader -console -q -n/u2/apps/batching/batchinsert.txt
Sample Output
Processed 1 of 4 record. Processed 2 of 4 records. Processed 3 of 4 records. Processed 4 of 4 records. Done processing batch file 'c:/batching/batchinsert.txt'. Out of 4 records processed, 4 succeeded and 0 errors occurred.
4.3.7 Adding a Redirect
You can use a redirect symbol on the command line to send the Batch Loader output to a separate log file. The symbol works on both UNIX and Windows. By default, the -console
switch sends the Batch Loader's output to stderr. To redirect the output to a different file, use the special redirect symbol 2>
.
In the following examples, each command must be entered all on one line.
-
Windows command line with redirect:
BatchLoader.exe -console -q -nc:/batching/batchinsert.txt 2> batchlog.txt
-
UNIX command line with redirect:
BatchLoader -console -q -n/u2/apps/batching/batchinsert.txt 2> /logs/CSbatchload.log
4.4 Optimizing Batch Loader Performance
This section provides some basic guidelines that you can use to improve Batch Loader performance. These suggestions can minimize potentially slow batch load performance when you are checking in a large number of content items. In many cases, proper tuning for batch loading can significantly speed up a slow server.
To minimize batch loading slow downs, try implementing the following Batch Loader adjustments:
-
Temporarily disable other activities such as shutting down Inbound Refinery (see Starting and Stopping Oracle WebCenter Content Server and Inbound Refinery Instances in Managing Oracle WebCenter Content) and suspending the automatic update cycle feature of the Repository Manager.
-
Analyze your database usage during a batch load to help the database query optimizer. Databases have built-in optimizer utilities that can help make database queries more efficient. However, to maximize the efficiency of optimizers, it is necessary to update or re-create the statistics about the physical characteristics of a table and the associated indexes. These characteristics include number of records, number of pages, and the average record length. The optimizers use these statistics to access data.
Each database has a proprietary command that you can use to invoke the statistic update or recreation process. For example:
-
For Oracle, use the
ANALYZE TABLE COMPUTE STATISTICS
command -
For SQL Server, use the
CREATE STATISTICS
statement -
For DB2, use the
RUNSTATS
command
-
4.5 Best Practice Case Study
This case study describes a very slow load batch performance and the steps taken to diagnose and correct the situation. This information can serve as a model for isolating underlying issues and resolving batch loading performance problems.
4.5.1 Background Information
A user wanted to load 27,000 content items in to the Content Server instance that was running on an AIX server. The DB2 database was running on a separate AIX server. The content items included TIF files as the native files and corresponding PDF files as the web-viewable files. Inbound Refinery generated thumbnails from the native files.
Initially during the batch load, the performance was acceptable with sub-second insert times. However, after a few thousand content items were loaded, the performance began to degrade. Content items started to require a few seconds to load and, eventually, the load time was over 10 seconds per content item.
4.5.2 Preliminary Troubleshooting
While the batch load was running, nothing seemed to be wrong with the Content Server instance. It had sufficient memory, the CPU utilization was low (less than 5%), and there were no disk bottlenecks. The Inbound Refinery server was busy, but was processing thumbnails at an acceptable rate.
Two issues were found with the database server:
-
Two processes were taking turns to update the database. While one process was executing, the second process waited for other process to release database locks. When the first process completed, the second process executed while the first process waited. The processes in this execute/wait cycle included:
-
The actual batch load process that was updating the database tables after inserting a content item.
-
The Content Server instance was updating the database tables; changing the status from GENWWW to DONE after receiving notification that a thumbnail had been completed.
The two processes should not have been contending with each other because they were not updating the same content items. It seemed that the two processes were locking each other out because DB2 had performed lock escalation and was now locking entire database pages instead of single rows.
-
-
There were a large number of tablespace scans being performed by both processes.
4.5.3 Solution
A two-step solution was used:
-
Inbound Refinery was shut down to prevent the status update process from competing with the batch loading process. The performance did improve because there was a 2000+ backlog of content items from the completed thumbnails.
-
A
RUNSTATS
command was issued on all the Content Server database tables to update the table statistics. This dramatically improved the performance of the batch load. The insert time returned to sub-second and the batch load completed within a short amount of time. It took 21 hours to insert the first 22,000 content items. After updating the table statistics, the remaining 5,000 content items were inserted in 13 minutes.