5 Managing Document Conversion Processing
This chapter includes the following sections:
5.1 Introduction to Document Conversion
This section covers the following topics:
5.1.1 Key Document Conversion Processor Job Settings
The Document Conversion Processor provides automated conversion of non-image electronic documents and attachments such as Microsoft Word, Excel, or PDF documents to image format. Create Document Conversion Processor jobs that specify the following:
-
Which files and attachments to convert, specified by file name (for example, PDF files).
-
If a script should be used to customize Document Conversion Processor functions.
-
The format to convert non-image files into: black and white TIFF or color JPEG.
-
If and how an external conversion program should be used to convert documents.
-
If and how non-image documents and attachments should be merged during document conversion.
-
How metadata values should be applied when merging documents.
-
The next post-processing step (if any) after document conversion. For example, converted batches might go to a Recognition Processor job or the Commit Processor might output them. If no post-processing step is specified, processed batches become available for indexing users to complete.
Figure 5-1 Specifying Documents to Convert on the Document Selection Train Stop

Description of "Figure 5-1 Specifying Documents to Convert on the Document Selection Train Stop"
5.1.2 Important Points about Document Conversion
-
Batches undergoing document conversion may contain a mixture of image and non-image document files. If the processor encounters documents that are already in image format, it skips them but still merges documents as configured.
-
In Capture, document conversion is an intermediate batch flow step. This means you must configure how batches reach document conversion (Configuring Batch Flow to a Document Conversion Processor Job), whether an external conversion program should be used (Specifying Settings for Using an External Conversion Program), and the next post-processing step (if any) that occurs after document conversion (Configuring Post-Processing and Monitoring).
-
You can customize the Document Conversion Processor's behavior by incorporating JavaScripts. See Customizing Document Conversion Processing Using Scripts.
-
You can monitor document conversion processing through post-processing options. For example, you can configure separate email notifications for batches that process successfully and for those that encounter system errors, and can rename batches and change their status or priority. For post-processing information, see Configuring Post-Processing and Monitoring. For information about system errors, see Handling Document Conversion Processing System Errors.
5.1.3 Document Conversion Processor Use With Other Batch Processors
The Document Conversion Processor is often used with other processor jobs, as described in the following scenarios:
5.1.3.1 Use Case 1: Processing Expense Reports
-
Using a multi-function device (MFD), an end-user scans an expense report, which emails it.
-
The end-user scans the expense report along with receipts into a single PDF file that contains the cover sheet with one or more bar codes.
-
After scanning, the MFD emails the document to a designated email account for expenses processing.
-
-
The Import Processor imports and processes the email, creates a batch, and forwards it for document conversion.
-
The Import Processor processes the email message, and creates a batch containing two documents:
1) Report PDF (cover sheet/report)
2) Email message, positioned as the last document in the batch
-
After processing the email message, the Import Processor forwards the newly created batch to the Document Conversion Processor queue.
-
-
The Document Conversion Processor converts the documents in the batch to image format, merges the documents (expense report and email message) into a single document, and forwards the batch for recognition processing.
-
The Document Conversion Processor converts the PDF and email message to image format.
-
The Document Conversion Processor merges the two documents into a single document. In case the email had important information, it is included as the last page of the expense document.
-
The Document Conversion Processor forwards the batch to the Recognition Processor queue so that the expense report number can be automatically recognized from the cover sheet.
-
-
The Recognition Processor performs bar code recognition and indexing of the document, and then forwards the batch to the Commit Processor for committing.
-
The Commit Processor commits the documents in the batch.
5.1.3.2 Use Case 2: Processing Invoices
-
A vendor sends an email to a designated email account with two invoices in PDF format attached along with explanatory text in the email message's body text.
-
The Import Processor imports and processes the email message, creating a batch containing three documents.
The import job is configured to place the email message body as the last document in the batch and to forward the batch for document conversion.
-
The Document Conversion Processor converts, merges, and forwards the batch for committing.
-
The document conversion job converts the PDFs and the email message documents to image format.
-
The document conversion job is configured to merge the last document (email message body) in the batch to each previous document. As a result, the email message body is appended to both invoices.
-
The email message document (third document) is removed from the batch.
-
-
The Commit Processor commits each invoice along with the original email message so that another process such as Oracle WebCenter Forms Recognition can perform automated data extraction.
5.2 Adding, Copying, or Editing a Document Conversion Job
To add, copy, or edit a document conversion job:
-
In a selected workspace, select the Processing tab.
-
In the Document Conversion Jobs table, click the Add button or select a job and click the Edit button.
You can also copy a document conversion job, by selecting a job, clicking the Copy button, and entering a new name when prompted. Copying a job allows you to quickly duplicate and modify it.
-
Complete settings on the Document Selection train stop.
-
Enter a name for the job in the Name field. Enter an optional description in the Description field.
-
The Online field is automatically selected. See Activating or Deactivating Document Conversion Jobs.
-
If the document conversion job uses a script, select it in the Script field. This field displays scripts previously added on the Advanced tab and assigned a type of Document Converter Processor. See Customizing Document Conversion Processing Using Scripts.
-
In the Documents to Convert field, select whether to process all non-image documents or only ones matching a specified file name filter. For example, to process PDF documents only, choose Selected non-image documents, then enter *.pdf in the File Name Filter field. You can enter an asterisk (*) as a wildcard character, and separate multiple filters with a comma or semi-colon. If you do not want to convert documents, you can select the Do not convert field. To process documents for specific document profiles, select one or more document profiles listed in the Restrict to Document Profiles field. Select All to process documents for all defined document profiles.
-
In the Attachments to Convert field, select whether to process all non-image document attachments or only ones matching a specified file name filter. For example, to process PDF documents only, choose Selected non-image documents, then enter *.pdf in the File Name Filter field. You can enter an asterisk (*) as a wildcard character, and separate multiple filters with a comma or semi-colon. If you do not want to convert attachments, you can select the Do not convert field. To process attachments for specific attachment types, select one or more attachment types listed in the Restrict to Attachment Types field. Select All to process attachments for all defined attachment types.
-
-
Complete settings on the Output Format train stop.
-
You can convert non-image documents to either black and white TIFF (default), greyscale TIFF, or color JPEG. If you select JPEG, specify an image quality from 1 (lowest quality) to 99 (highest quality) to use for compression in the JPEG Image Quality field; the default value is 85. Select a resolution in the DPI field; the default is 200.
-
Under Image Settings, in the Blank Page Byte Threshold field, enter a file size value (in bytes). Any image whose size is less than or equal to the threshold is considered a blank page and therefore deleted.
-
-
Complete settings on the External Conversion train stop. See Specifying Settings for Using an External Conversion Program.
-
Complete settings on the Document Merge Options train stop.
See Specifying How Documents are Merged and Metadata is Assigned.
-
On the Post-Processing train stop, specify what happens after document conversion processing completes, depending on its success.
-
Review settings on the Summary train stop and click Submit to save the job.
-
Configure how batches flow to the Document Conversion Processor job. See Configuring Batch Flow to a Document Conversion Processor Job.
-
Test the document conversion job you created.
5.3 Configuring Blank Page Detection in a Document Conversion Job
When users perform document conversion, non-image documents are converted to image documents and they may contain blank pages. To configure Capture to automatically detect and delete blank pages from documents, specify a threshold file size, where any image whose size is less than or equal to this threshold size will be considered a blank page and therefore will be deleted.
5.4 Deleting a Document Conversion Job
Deleting a document conversion job makes it unavailable for batches for which it is set as a post-processing step. If a job specified for post-processing is unavailable, an error results for the batch. You may want to change a job to offline for a time before deleting it, allowing you to resolve unexpected issues with its deletion.
To delete a document conversion job:
5.5 Activating or Deactivating Document Conversion Jobs
If online, document conversion jobs run when selected in a client profile or processor job's Post-Processing train stop. You can temporarily stop the job from running (take it offline) or change a deactivated job to run again.
Note:
When reactivating a job, it may take up to a minute for the job to resume processing batches that were queued while the job was offline.
Follow these steps to change a document conversion job to online or offline:
5.6 Specifying How Documents are Merged and Metadata is Assigned
The Document Conversion Processor lets you specify if and how to merge documents in a batch during conversion processing and how to assign metadata values when merging documents.
The merge and metadata assignment options accommodate common document conversion scenarios. For example, the Import Processor might import email messages with PDF attachments, then send them for document conversion. Because the email message is common to each attached PDF document and might be important for processing or indexing each one, you would select one of the document merge options that merges a source document (email message, in this case) with all other target documents (PDF).
5.7 Configuring Post-Processing and Monitoring
Use a document conversion job's post-processing options to specify what happens after processing completes, depending on processing success.
To configure post-processing settings:
-
In a selected workspace, add or edit a Document Conversion Processor job. See Adding, Copying, or Editing a Document Conversion Job.
-
Click the Post-Processing train stop.
The screen lists the same processing options for successful processing (no system errors) and unsuccessful processing (one or more system errors).
-
In the Batch Processor and Batch Processor Job fields, specify which processing step, if any, occurs after document conversion processing completes. You can choose a batch processor of None (no processing occurs), Commit Processor, Recognition Processor, or Document Conversion Processor. If you choose Recognition Processor or Document Conversion Processor, specify a processor job.
For example, you might send batches with no system errors to the Commit Processor. You might specify None for batches with system errors, then change their batch status or prefix to facilitate further processing in the client.
-
In the email address fields, optionally enter an address to which to send an email after processing completes successfully or fails. While configuring and testing a Document Conversion Processor job, you might set yourself to receive email notifications upon system errors, then later automatically alert an administrator of processing errors.
-
In the remaining fields, specify how to change processed batches.
-
Rename batches by adding a prefix. For example, rename batches that were unsuccessful with the prefix
ERR
for follow-up. -
Change batch status or priority. For example, you might change the status of batches with system errors, then create a client profile with batch filtering set to this status to allow qualified users to manually edit and complete batches that encountered errors.
-
-
Click Submit to save the job.
5.8 Configuring Batch Flow to a Document Conversion Processor Job
To run a document conversion job, you must configure batches to flow to the job for processing. You do this by setting the Document Conversion Processor job as a post-processing step in a client profile or other processor job. To configure batch flow from:
-
A client profile, see Configuring a Client Profile's Post-Processing.
-
An Import Processor job, see Configuring Post-Processing.
-
A Recognition Processor job, see Configuring Post-Processing and Monitoring.
For example, you might create an Import Processor job that imports email messages and their PDF attachments, then sends them to the Document Conversion Processor for conversion to image format, then sends them to a Recognition Processor job for bar code recognition.
5.9 Specifying Settings for Using an External Conversion Program
The External Conversion train stop lets you specify if and how to use an external conversion program for document conversion.
5.10 Customizing Document Conversion Processing Using Scripts
To customize Document Conversion Processor behavior, incorporate JavaScripts.
5.11 Handling Document Conversion Processing System Errors
A Document Conversion Processor job might encounter system errors such as the following during processing:
-
Errors converting non-images.
-
Errors related to accessing the database, such as a network failure.
In addition to email notification, the Capture system administrator can consult the Document Conversion Processor performance metrics and logs to address system issues.