Extracting Feature Content from a Document

To extract specific feature content (such as tables and fields) from a file in PDF, JPG, PNG, or TIFF format, use documentCapture.documentToStructure(options). For a sample, see Extract Feature Content from a Document Synchronously.

Provide the following parameters:

options.file - The document file to extract content from. This file must be located in the NetSuite File Cabinet, and you can specify the file using its internal ID or file path.
options.documentType (optional) - The document type. By specifying the type of document, the service can apply pretrained models that are optimized for that type, which can provide more accurate extraction results. Use values from the documentCapture.DocumentType enum to set this parameter. If you don't specify a value for this parameter, the DocumentType.OTHERS type is used by default.
options.features (optional) - The features to extract from the specified document. Use values from the documentCapture.Feature enum to set this parameter. If you don't specify a value for this parameter, the Feature.TEXT_EXTRACTION and Feature.TABLE_EXTRACTION features are used by default.
options.language (optional) - The language of the specified document. Use values from the documentCapture.Language enum to set this parameter. If you don't specify a value for this parameter, ENG (English) is used by default.
options.timeout (optional) - The timeout period, in milliseconds, to wait for the service to return results. The default value is 30,000 milliseconds (30 seconds). You can specify a longer timeout period, but you can't specify a period shorter than 30,000 milliseconds. If you do, the default 30,000 millisecond timeout is used instead.
options.ociConfig (optional) - Oracle Cloud Infrastructure (OCI) credentials to obtain unlimited usage. For more information about providing these credentials, see Using OCI Credentials to Obtain Additional Usage. If you don't specify these credentials, successful calls to documentCapture.documentToStructure(options) consume usage from the free monthly usage pool of requests provided in NetSuite by default.

The documentCapture.documentToStructure(options) method returns a documentCapture.Document object with the following structure:

          {
    mimeType: string,
    pages: {
        fields: Field[],
        lines: Line[],
        tables: Table[],
        words: Word[]
    }
}

The data that's available in this object depends on the features you specify when you call documentCapture.documentToStructure(options). For example, this object includes fields (as documentCapture.Field objects) only when you specify the Feature.FIELD_EXTRACTION feature.

Keep the following considerations in mind:

The documentCapture.documentToStructure(options) method extracts content synchronously and supports documents up to five pages in length. If you want to extract content from longer documents, you must submit an asynchronous task using the N/task module. For an example, see Extract Content from a Document Asynchronously.
Encrypted files are not supported.

Related Topics