22 Working with Conversions
Note:
Native conversions fail when Inbound Refinery is run as a service on win64 platforms. This is due to the fact that services on win64 platforms do not have access to printer services. If performing native conversions, Inbound Refinery should not be run as a service.
For additional information describing the different types of conversion, how and where they are performed, and the advantages of each type, see the "Conversions in WebCenter Content" blog.
This chapter includes the following topics:
22.1 Managing PDF Conversions
Inbound Refinery can convert native files to PDF by either exporting to PDF directly using Oracle Outside In PDF Export (included with Inbound Refinery) or by using third-party applications to output the native file to PostScript and then using a third-party PDF distiller engine to convert the PostScript file to PDF.
PDF conversions require the following components to be installed and enabled on the Inbound Refinery server.
Component Name | Component Description | Enabled on Server |
---|---|---|
PDFExportConverter |
Enables Inbound Refinery to use Oracle OutsideIn to convert native formats directly to PDF without the use of any third-party tools. PDF Export is fast, multi-platform, and allows concurrent conversions. |
Inbound Refinery Server |
WinNativeConverter |
Enables Inbound Refinery to convert native files to a PostScript file with either the native application or OutsideInX and convert the PostScript file to PDF using a third-party distiller engine. This component is for Windows platform only. It replaces the functionality previously made available in the deprecated PDFConverter component. WinNativeConverter offers the best rendition quality of all PDF conversion options when used with the native application on a Windows platform. This does not allow concurrent conversions. WinNativeConverter also enables Inbound Refinery to convert native Microsoft Office files created with Word, Excel, PowerPoint and Visio to HTML using the native Office application. |
Inbound Refinery Server |
Note:
Native conversions fail when Inbound Refinery is run as a service on win64 platforms. This is due to the fact that services on win64 platforms do not have access to printer services. If performing native conversions, Inbound Refinery should not be run as a service.
This section describes how to work with PDF conversions and includes the following topics:
22.1.1 PDF Conversion Considerations
There are several factors to consider when choosing a PDF conversion method. System performance (the time it takes to convert a file to PDF format), the fidelity of the PDF output (how closely it matches the look and formatting of the native file), what native applications are needed (such as Microsoft Word or PowerPoint, used to generate the PostScript file converted by Inbound Refinery), and the platform a conversion application requires should all be taken into consideration.
If the speed of conversion is a primary concern, using PDF Export to convert original files directly to PDF is fastest. In addition to not having to use third-party tools, PDF Export allows concurrent PDF conversions and supports Windows, Linux and UNIX platforms.
If the fidelity of the PDF output is a primary concern, then using the native application to open the original file, output to PostScript, and convert the PostScript to PDF is the best option. However, this method is limited to the Windows platform and it cannot run concurrent PDF conversions.
Table 22-1 compares conversion methods and lists the platforms they support.
Note:
Regardless of the conversion option used, a PDF is a web-ready version of the native format. A converted PDF should not be expected to be an exact replica of the native format. Many factors such as font substitutions, complexity and format of embedded graphics, table structure, or issues with third-party distiller engines may cause the PDF output to differ from the native format.
Table 22-1 PDF Conversion Methods
Conversion Method | Performance | Fidelity | Supported Platforms | Concurrent PDF Conversions |
---|---|---|---|---|
PDF Export |
Best |
Good |
Windows/UNIX |
Yes |
3rd-Party Native Applications |
Good |
Best |
Windows |
No |
22.1.2 Configuring PDF Conversion Settings
This section discusses the following topics regarding PDF conversion settings:
22.1.2.1 Configuring Content Servers to Send Jobs to Inbound Refinery
File extensions, file formats, and conversions are used in Content Server to define how content items should be processed by Inbound Refinery and its conversion add‐ons. Each Content Server must be configured to send files to refineries for conversion. When a file extension is mapped to a file format and a conversion, files of that type are sent for conversion when they are checked into the Content Server. Use either the File Formats Wizard or the Configuration Manager to set the file extension, file format, and conversion mappings.
All conversions required for Inbound Refinery are available by default in Content Server. For more information about configuring file extensions, file formats, and conversions in your Content Servers, see About MIME Types and Managing File Types.
Conversions available in the Content Server should match those available in the refinery. When a file format is mapped to a conversion in the Content Server, files of that format are sent for conversion upon check-in. One or more refineries must be set up to accept that conversion. Set the conversions that the refinery will accept and queue maximums on the Conversion Listing page. All conversions required for Inbound Refinery are available by default in both Content Server and Inbound Refinery.
For more information about setting accepted conversions, see Setting Accepted Conversions.
22.1.2.2 Setting PDF Files as the Primary Web‐Viewable Rendition
To set PDF files as the primary web‐viewable rendition:
22.1.2.3 Installing a Distiller Engine and PDF Printer
When converting documents to PDF using WinNativeConverter, a distiller engine and PDF printer must be obtained, installed, and configured. This is not necessary when converting to PDF using Outside In PDF Export to open and save documents to PDF.
WinNativeConverter can use several third-party applications to create PDF files of content items. In most cases, a third-party application that can open and print the file is used to print the file to PostScript, and then the PostScript file is converted to PDF using the configured PostScript distiller engine. In some cases, WinNativeConverter can use a third-party application to convert a file directly to PDF.
Note:
A distiller engine is not provided with Inbound Refinery. You must obtain a distiller engine of your choice. The chosen distiller engine must be able to execute conversions via a command-line. The procedures in this section use AFPL Ghostscript as an example. This is a free, robust distiller engine that performs both PostScript to PDF conversion and optimization of PDF files during or after conversion.
To install the PDF printer:
22.1.2.4 Configuring Third‐Party Application Settings
To change third‐party application settings:
- Log into the refinery.
- Select Conversion Settings then Third‐Party Application Settings.
- On the Third-Party Application Settings page, click Options for the third‐party application.
- Change the third‐party application options.
- Click Update to save your changes.
22.1.2.5 Configuring Timeout Settings for PDF Conversions
To configure timeout settings for PDF file generation:
22.1.2.6 Setting Margins When Using Outside In
Inbound Refinery includes Outside In version 8.3.2. When using Outside In to convert graphics to PDF, you can set the margins for the generated PDF from 0–4.23 inches or 0–10.76 cm. By default, Inbound Refinery uses 1‐inch margins on the top, bottom, right, and left.
To adjust these margins:
22.2 Managing Tiff Conversions
Tiff conversion enables the following functionality specific to TIFF (Tagged Image File Format) files:
-
Creation of a managed PDF file from a single or multiple-page TIFF file.
-
Creation of a managed PDF file from multiple TIFF files that have been compressed into a single ZIP file.
-
OCR (Optical Character Recognition) during TIFF-to-PDF conversion. This enables indexing of the text within checked-in TIFF files, so that users can perform full-text searches of these files.
The TiffConverter component is supported on Windows only. For information on file formats and languages that can be converted by PdfCompressor, see the documentation provided by CVISION.
Note:
The TiffConverter component requires CVISION CVista PdfCompressor to perform TIFF-to-PDF conversion with OCR. PdfCompressor is not provided with the TiffConverter component. You must obtain PdfCompressor from CVISION.
TIFF conversions require the following components to be installed and enabled on the specified server.
Component Name | Component Description | Enabled on Server |
---|---|---|
TiffConverter |
Enables Inbound Refinery to convert single or multipage TIFF files to PDF complete with searchable text. |
Inbound Refinery Server |
TiffConverterSupport |
Enables Content Server to support TIFF to PDF conversion. |
Content Server |
22.2.1 Configuring Content Servers to Send Jobs for Tiff Conversion
File formats and conversion methods are used in Content Server to define how content items should be handled by Inbound Refinery and the conversion options. Installing and enabling the TiffConverterSupport component on a Content Server adds three TIFFConversion options on the File Formats Wizard page.
For a content item to be processed by Inbound Refinery, its file extension (for example, TIF or TIFF) must be mapped to a format name associated with the TIFFConversion conversion method. The added conversion options for Tiff Converter are not automatically mapped. They must be mapped manually. The following topics describe how to set the mappings:
22.2.1.1 Using the File Formats Wizard for Tiff Conversion
File formats and conversion methods for Inbound Refinery can be managed in Content Server using the File Formats Wizard. You can convert TIFF to PDF with OCR or TIFF to PDF without OCR.
To convert TIFF to PDF with OCR:
-
Log in to the Content Server as an administrator.
-
From the main menu, choose Administration then Refinery Administration then File Formats Wizard.
-
On the File Format Wizard page, select tiff, tif to enable Convert TIFF to PDF (TIFFConversion) in the File Type (conversion name) field menu. Selecting this menu item maps the TIF and TIFF file extensions to the image/tiff file format and associates the image/tiff file format with the TIFFConversion conversion method. When TIF or TIFF files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the image/tiff file format to PASSTHRU, so TIF and TIFF files are not processed by Inbound Refinery.
Note:
The TIFFConversion conversion method is only available when the TiffConverterSupport component has been installed and enabled, and the Content Server has been restarted.
-
If you have added
tifz
andtiz
file extensions using the Configuration Manager, you can select tifz, tiz on the File Format Wizard page to enable application/zip options in the File Type (conversion name) field menu.-
Compressed Tiff to PDF (tifz, tiz): Selecting this menu item maps the TIFZ and TIZ file extensions to the graphic/tiff-x-compressed file format and associates the graphic/tiff-x-compressed file format with the TIFFConversion conversion method. When TIFZ or TIZ files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the graphic/tiff-x-compressed file format to PASSTHRU, so TIFZ and TIZ files are not processed by Inbound Refinery.
-
Compressed Tiff to PDF (zip): Selecting this menu item maps the ZIP file extension to the application/zip file format and associates the application/zip file format with the TIFFConversion conversion method. When ZIP files are checked into the Content Server, they are processed by the refinery using Tiff Converter and converted to PDF with OCR. Deselecting this check box sets the application/zip file format to PASSTHRU, so that ZIP files are not processed by Inbound Refinery.
-
-
Click Update to save all changes.
To convert TIFF to PDF without OCR:
-
Log in to the Content Server as an administrator.
-
From the main menu, choose Administration then Refinery Administration then File Formats Wizard.
-
On the File Format Wizard page, select tiff, tif to enable Convert TIFF to PDF (Direct PDFExport) in the File Type (conversion name) field menu. Selecting this menu item maps the TIF and TIFF file extensions to the image/tiff file format and associates the image/tiff file format with the Direct PDFExport conversion method. When TIF or TIFF files are checked into the Content Server, they are processed by the refinery using oit PDFExport and converted to PDF without OCR.
Note:
When the TIFF to PDF (Direct Export) options is used, only the metadata in the resulting PDF is searchable, the text is not searchable.
-
Click Update to save all changes.
22.2.1.2 Using the Configuration Manager for Tiff Conversion
File formats and conversion methods for Inbound Refinery can be managed in Content Server using the Configuration Manager. To make changes:
-
Log in to Content Server as an administrator.
-
From the main menu, choose Administration, then Desktop Client Apps.
-
From the apps list, choose Configuration Manager.
The Configuration Manager app is started.
-
In the Configuration Manager app, choose Options then File Formats.
-
To enable single, unzipped TIFF files (TIF and TIFF) to be processed by Inbound Refinery:
-
In the File Formats section, check that the image/tiff file format is added and associated with the TIFFConversion conversion method.
Note:
The TIFFConversion conversion method is only available when the TiffConverterSupport component has been installed and enabled, and the Content Server has been restarted.
-
In the File Extensions section, check that the tif and tiff file extensions are added and mapped to the image/tiff file format.
-
-
To enable TIFF files that have been compressed into a single TIFZ or TIZ file to be processed by Inbound Refinery:
-
In the File Formats section, check that the graphic/tiff-x-compressed file format is and associated with the TIFFConversion conversion method.
-
In the File Extensions section, check that the tifz and tiz file extensions are added and mapped to the graphic/tiff-x-compressed file format.
-
-
To enable TIFF files that have been compressed into a single ZIP file to be processed by Inbound Refinery:
-
In the File Formats section, check that the application/zip file format is added and associated with the TIFFConversion conversion method.
-
In the File Extensions section, check that the zip file extension is added and mapped to the application/zip file format.
-
22.2.1.3 Tips for Processing Zip Files in Tiff Conversion
The ZIP file extension might be used in multiple ways in your environment. For example, you might be checking in:
-
Multiple TIFF files compressed into a single ZIP file for Inbound Refinery to convert to a single PDF file with OCR.
-
Multiple file types compressed into a single ZIP file that should not be processed (the ZIP file should be passed through in its native format).
When using the ZIP file extension in multiple ways, Oracle recommends configuring the Content Server to allow the user to choose how ZIP files are processed at check-in. This is referred to as Allow override format on check-in. To enable this Content Server functionality:
Note:
If you are using the upload applet to check in multiple files, the files are compressed into a single ZIP file before being checked in. In this case Oracle also recommends enabling Allow override format on check-in so the user can choose how the ZIP file is processed when uploading multiple TIFFs.
Tip:
When CVista PdfCompressor merges multiple TIFF files from a compressed ZIP file, the input files are added in lexicographic order according to the standard ASCII character set.
22.2.2 Configuring Tiff Conversion Settings
This section discusses the following topics regarding conversion settings:
22.2.2.1 Setting Accepted Conversions
When installed on the refinery, the TiffConverter component adds the TIFFConversion option to the Conversion Listing page. This conversion option must be enabled for the refinery to perform conversions on items submitted by the Content Server.
22.2.2.2 Changing Timeout Settings
The timeout settings should reflect the processing time required for the size of TIFF files that are commonly checked in to the Content Server. This is highly variable depending on CPU power and TIFF complexity. Perform these tasks to determine the appropriate timeout values for TIFF files:
-
Run and time several representative Inbound Refinery jobs using CVista PdfCompressor alone (without the Inbound Refinery).
-
Examine the document history information and evaluate the required processing time.
-
Change Inbound Refinery timeout settings accordingly.
Note:
Information about Tiff Converter timeouts is recorded in the Inbound Refinery and agent logs.
To configure timeout settings for Tiff to PDF file generation:
22.2.3 Configuring CVista PdfCompressor
This section discusses the following topics regarding the CVista PdfCompressor:
22.2.3.1 Changing PdfCompressor Settings
These options are specific to CVista PdfCompressor. If the TiffConverter component is not installed, the CVista PdfCompressor Options are not available.
To change the PdfCompressor settings:
- Login to the refinery.
- Choose Conversion Settings then Third-Party Applications Settings.
- On the Third-Party Application Settings page, click Options for CVista PdfCompressor.
- On the CVista PdfCompressor Options page, set the path to the location of the CVista PdfCompressor executable in the appropriate text box.
- Enter the string of parameter values in the parameters option text box. A default option string is set on installation of the TiffConverter component.
- Click Update to save the settings.
Tip:
When CVista PdfCompressor merges multiple TIFF files from a compressed ZIP file, the input files are added in lexicographic order according to the standard ASCII character set.
The following recommended parameter strings should produce optimal results for each given scenario. If these settings do not produce the intended results, modify these strings by removing or appending settings. For more information on these and other available settings, see the online help provided with CVista PdfCompressor (especially "Appendix A: Command-Line Flags for Compression").
Default CVista PdfCompressor Parameters - OCR Enabled
A default string is set when the TiffConverter component is installed unless a string already exists (if the string was set using a previous version of Tiff Converter). The default string has been optimized for typical PdfCompressor usage with OCR enabled:
‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐o ‐ocrmode 1 ‐ot 120 ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong
CVista PdfCompressor Parameters- Horizontal and Vertical OCR Enabled
The following string can be used for typical usage with OCR and support OCR processing of both vertical and horizontal text in the same image (add -ocrtwod):
‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐o ‐ocrmode 1 ‐ot 120 ‐ocrtwod ‐lsize 25 ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong
CVista PdfCompressor Parameters - No OCR
The following string can be used for simple conversion (without OCR):
‐m ‐c ON ‐colorcomptype 2 ‐mrcquality 5 ‐mrcColorCompType 0 ‐linearize ‐qualityc 75 ‐qualityg 75 ‐rscdwndpi 300 ‐rsgdwndpi 300 ‐rsbdwndpi 300 ‐cconc ‐ccong
22.2.3.2 Configuring CVista PdfCompressor OCR Languages
Note:
Changes made in the CVista PdfCompressor user interface do not affect how CVista PdfCompressor functions when called by Tiff Converter.
By default, CVista PdfCompressor uses an English OCR dictionary when performing OCR on TIFF files. However, CVista PdfCompressor can perform OCR on several other languages.
To set up multiple OCR languages and enable the user to choose the OCR language at check-in:
Note:
If the following method is used, language parameters should not be specified or passed to the refinery via the CVista PdfCompressor Options Page.
-
Obtain the appropriate current language files by contacting CVISION:
-
A
lng
file is required for each language. -
Czech, Polish, and Hungarian also require the
latin2.shp
file. -
Russian also requires the
cyrillic.shp
file. -
Greek also requires the
greek.shp
file. -
Turkish also requires the
turkish.shp
file.
-
-
Place the CVISION language files in the CVista installation directory. The default location is
C:\Program Files\CVision\PdfCompressor
xx
\
wherexx
stands for the version number of PdfCompressor. -
Log in to Content Server as an administrator.
-
From the main menu, choose Administration then Desktop Client Apps.
-
From the apps list, choose Configuration Manager.
-
On the Configuration Manager page, click Information Fields tab.
-
If the OCRLang information field has been added, skip this step. If it has not been added:
-
In the Field Info section, click Add.
-
On the Add Custom Info page, in the Field Name field, enter OCRLang. This creates a new information field for CVista language conversion options.
Note:
Enter this field name exactly.
-
Click OK.
-
On the Add Custom Info Field page, in the Field Caption field, enter the descriptive caption to be displayed on the Content check-in Form page. For example,
OCR Language
. -
From the Field Type list, choose Text.
-
Select the Enable Option List check box.
-
From the Option List Type list, choose Select List Validated.
-
In the Use option list field, enter
xOCRLangList
. -
Click Edit next to the Use Option List field.
-
On the Option List page, enter the CVista OCR languages to present as options. The following language names are valid options.
Note:
You can use either the English language name or the native equivalent (if listed). However, you must enter the language options exactly as they appear in the following table.
English Native Czech
-
Danish
Dansk
Dutch
Nederlands
English
-
Finnish
Suomi
French
Français
German
Deutsch
Greek
-
Hungarian
Magyar
Italian
Italiano
Norwegian
Norsk
Polish
Polski
Portuguese
Português
Russian
-
Spanish
Español
Swedish
Svenska
Turkish
-
-
Select the Ignore Case check box.
-
Click OK.
-
In the Default Value field, enter the default OCR language option.
-
Click OK to save the settings and return to the Information Fields tab.
-
Click Update Database Design.
-
-
If the OCRLang Information field has been added, but changes must be made to the languages option list and/or the default language:
-
In the Field Info section, select OCRLang and click Edit.
-
On the Add Custom Info page, click Edit next to the Use Option List field.
-
On the Option List page, delete any unused CVista OCR languages.
-
Click OK.
-
In the Default Value field, enter the default OCR language option.
-
Click OK to save the settings and return to the Information Fields tab.
-
-
Close the Configuration Manager app. When a user checks in a TIFF file, the user can override the default OCR language by selecting any of the OCR languages that were set up.
22.3 Managing XML Conversions
XML conversions require the following components to be installed and enabled on the specified server.
Component Name | Component Description | Enabled on Server |
---|---|---|
XMLConverter |
Enables Inbound Refinery to produce FlexionDoc and SearchML-styled XML as the primary web-viewable file or as independent renditions, and can use the Xalan XSL transformer to process XSL transformations. |
Inbound Refinery Server |
XMLConverterSupport |
Enables Content Server to support XML conversions and XSL transformations. |
Content Server |
This section discusses the following XML conversion management topics:
22.3.1 Configuring Content Servers to Send Jobs to Inbound Refinery
File extensions, file formats, and conversions are used in Content Server to define how content items should be processed by Inbound Refinery and its conversion add‐ons. Each Content Server must be configured to send files to refineries for conversion.
When a file extension is mapped to a file format and a conversion, files of that type are sent for conversion when they are checked into the Content Server. File extension, file format, and conversion mappings can be configured using either the File Formats Wizard or the Configuration Manager.
Most conversions required for Inbound Refinery are available by default in Content Server. In addition to the default conversions, the following conversions are added to the Content Server when the XMLConverterSupport component is installed.
Conversion | Description |
---|---|
FlexionXML |
Used to convert files to XML using the FlexionDoc schema. It applies to file types other than the standard file types included in the list of conversions (for example, Word, PowerPoint, and so on). To send these standard file types to a refinery for conversion to XML using FlexionDoc, their file formats do not need to be re-mapped to the FlexionXML conversion. This conversion is not available on the File Formats Wizard. It must be mapped using the Configuration Manager. |
SearchML |
Used to convert files to XML using the SearchML schema. It applies to file types other than the standard file types included in the list of conversions (for example, Word, PowerPoint, and so on). To send these standard file types to a refinery for conversion to XML using SearchML, their file formats do not need to be re-mapped to the SearchML conversion. This conversion is not available on the File Formats Wizard. It must be mapped using the Configuration Manager. |
XSLT Transformation |
After XML Converter converts documents to the FlexionDoc schema, the XSLT conversion allows the resultant XML to be transformed into other XML schema specified by a developer. |
Conversions available in the Content Server should match those available in the refinery. When a file format is mapped to a conversion in the Content Server, files of that format are sent for conversion on check-in. One or more refineries must be set up to accept that conversion.
Most conversions required for Inbound Refinery are available by default. In addition to the default conversions that can be accepted by a refinery, the FlexionXML and SearchML conversions are added to the refinery when the XMLConverter component is installed. The FlexionXML and SearchML conversions are accepted by default.
22.3.2 Setting XML Files as the Primary Web‐Viewable Rendition
To set XML files as the primary web‐viewable rendition:
22.3.4 Setting Up XSL Transformation
Inbound Refinery uses the Xalan XSLT processor and the SAX validator built into the Java virtual machine running Inbound Refinery. To enable transformation, the XMLConverter component must be installed and enabled on the refinery server and the XMLConverterSupport component must be installed and enabled on the Content Server.
To turn on XSL Transformation:
-
Log into the refinery server.
-
Do one of the following:
-
If the XML rendition is to be the primary web-viewable file, click Conversion Settings then Primary Web Rendition. Enable Convert to XML on the Primary Web-Viewable Rendition Page when it is displayed.
-
If the XML is to be an additional rendition, click Conversion Settings then Additional Renditions. Enable Create XML renditions for all supported formats on the Additional Renditions Page when it is displayed.
-
-
Click XML Options.
-
On the XML Options page, enable Process XSLT Transformation and select the XML schema to use from the following options:
-
Produce FlexionDoc XML
-
Produce SearchML
-
-
Click Update to save all changes or Reset to revert to the last saved settings.
In order to preform XSL transformations Inbound Refinery must have an XSL template to apply during the transformation checked into Content Server. To check in an XSL template to Content Server:
-
Create an XSL file. The XSL file specifies how an XML file with a specific Content Type will be transformed to a new XML file. A DTD or schema can be specified for validation and stored in the Content Server, but is not required.
-
Check the XSL file into the Content Server and associate it to a Content Type.
-
In the Content check-in Form, select the Content Type from the Type list.
-
Enter the Content ID according to the following convention:
Content Type
.xsl
For example, if the Content Type is
Documents
, enter documents.xsl. -
Enter the XSL file as the Primary File.
-
Check that the Security Group matches any DTD/schema files in the Content Server associated with the XSL file and the native files that are checked into the Content Server.
-
Click Check In.
When files are checked in with this Content Type, and a FlexionDoc/SearchML XML file is generated by XML Converter or the checked-in file is XML, this XSL file will be used for XSL transformation to a new XML document.
-
-
Repeat these steps for each Content Type to post-process to XML.
22.3.4.1 XSLT Errors
When a validation fails, Inbound Refinery collects the errors from the SAX Validation engine, creates an hcsp error page and attempts to check in the page to Content Server.
Manually set up outgoing providers on Inboard Refinery to the Content Server for the refinery to check in an error page. The name of Inbound Refinery provide must match the agent name. For example if Inbound Refinery is named production_ibr
and it is converting files for a Content Server named production_cs
, then an outgoing provider named production_cs
must be created on the production_ibr
Inbound Refinery.
To set up a criteria workflow to be notified regarding XSL transformation failures:
- From the main menu, choose Administration then Desktop Client Apps.
- From the app list, choose Workflow Admin.
- Add a criteria workflow for notification of XSLT transformation failures.
- Add a workflow step with the following properties:
-
Users: specify the users that should be notified.
-
Exit Conditions: select At least this many reviewers, and set the value to 0.
-
Events: For the Entry event, add the following Custom Script Expression:
<$if dDocTitle like "*XSLT Error"$> <$else$> <$wfSet("wfJumpEntryNotifyOff", "1")$> <$wfExit(0,0)$> <$endif$>
For details about using workflows, see Managing Workflows.
22.4 Converting Microsoft Office Files to HTML
Inbound Refinery can convert native Microsoft Office files to HTML by using the native Microsoft Office applications installed on a Windows system. Content Server can be installed on either a Windows or UNIX platform, but for Microsoft Office to HTML conversions to work, Inbound Refinery must be configured on the Windows system where the Microsoft Office native applications are installed.
HTML conversion automates opening Microsoft office files in their native application, saves them out as HTML pages, then collects the HTML output into a compressed ZIP file that gets returned to Content Server.
HTML conversion can process the following types of files:
-
Microsoft Word 2003 through 2010
-
Microsoft Excel 2003 through 2010
-
Microsoft PowerPoint 2003 through 2010
-
Microsoft Visio 2007
When WinNativeConverter is enabled to work with Inbound Refinery, native Microsoft Office files checked into Content Server are sent to Inbound Refinery for conversion. Inbound Refinery automates the process of converting the files to HTML using the native Microsoft Office applications. If a single HTML page is returned to Content Server, it is used as the web-viewable file. If conversion results in multiple HTML pages, the following files are returned to Content Server:
-
An HCSP page as the primary web-viewable rendition
-
A ZIP file that includes the HTML output from the Office application
-
Optionally, a thumbnail rendition of the native Microsoft Office file
When a user clicks on the web-viewable link in Content Server of a document converted to multiple HTML pages by Inbound Refinery, the HCSP page redirects the server to the HTML rendition.
Microsoft Office to HTML conversions require the following components to be installed and enabled on the specified server.
Component Name | Component Description | Enabled on Server |
---|---|---|
WinNativeConverter |
Enables Inbound Refinery to convert native Microsoft Office files created with Word, Excel, PowerPoint and Visio to HTML using the native Office application. |
Inbound Refinery Server |
MSOfficeHtmlConverterSupport |
Enables Content Server to support HTML conversions of native Microsoft Office files converted by Inbound Refinery and returned to Content Server in a ZIP file. Requires that ZipRenditionManagement component be installed on the Content Server. |
Content Server |
ZipRenditionManagement |
Enables Content Server access to HTML renditions created and compressed into a ZIP file by Inbound Refinery. |
Content Server |
This section discusses how to configure Content Server to work with Microsoft Office to HTML conversions:
22.4.1 Configuring Content Servers to Send Jobs for HTML Conversion
When installed on the refinery, the WinNativeConverter adds the Word HTML, PowerPoint HTML, Excel HTML, and Visio HTML option to the Conversion Listing page. This conversion option must be enabled for the refinery to perform conversions on items submitted by the Content Server. File formats and conversion methods are used in Content Server to define how content items should be handled by Inbound Refinery and the conversion options.
For a Microsoft Office document to be processed by Inbound Refinery, its file extension must be mapped to a format name that is associated with the HTML Conversion method. The added conversion options for HTML Conversion are not automatically mapped: they must be mapped manually. They can be set either using the File Formats Wizard or the Configuration Manager applet. The Configuration Manager applet gives you greater control over which file extensions are mapped to which conversion options. For details, see the following sections:
22.4.1.1 Using the File Formats Wizard for Microsoft Office Conversions
File formats and conversion methods for Inbound Refinery can be managed in Content Server using the File Formats Wizard. To make changes:
22.4.1.2 Using the Configuration Manager for Microsoft Office Conversions
File formats and conversion methods for Inbound Refinery can be managed in Content Server using the Configuration Manager. To make changes:
- Log in to Content Server as an administrator.
- From the main menu, choose Administration then Desktop Client Apps.
- From the app list, choose Configuration Manager.
- Choose Options then File Formats.
- Select the application format for the Office document type to convert from the Format column. For example, for Microsoft Word, select application/msword.
- Click Edit.
- In the Edit File Format dialog, select the HTML conversion option from the Conversion list appropriate to the selected Office document format. For example, for application/msword, select the conversion option Word HTML.
- Click OK.
- Repeat these steps for all Microsoft Office formats to convert to HTML.
- When finished, click Close to close the File Formats page and then close the Configuration Manager.
- Restart Content Server and Inbound Refinery.