Perform Document Classification and Key Value Extraction

Use pretrained OCI Document Understanding models to build document classification and key value extraction into your applications without machine learning (ML) or artificial intelligence (AI) expertise. For example, you might use document classification to identify passports, driver licenses, receipts, and invoices.

Note: OCI Document Understanding currently supports only English. See Limits for Document Understanding.

If you have fewer than 10,000 documents for a pretrained model or 2,000 documents for a custom model, you can process them in a single data flow. If you have more than these limits, then create a separate data flow to process each bucket (that is, using a separate dataset for each bucket), and use a Sequence to sequentially process the data flows. See Process Data Using a Sequence of Data Flows.

Prerequisites:

Ask your administrator to make sure that your Oracle Analytics instance is integrated with OCI Document Understanding.
Prepare a dataset that references the documents that you'd like to analyze and upload it to Oracle Analytics. See Prepare Documents to Analyze with an OCI Document Understanding Model.

On the Oracle Analytics Home page, click Create, and then click Data Flow.
Select the dataset linking to the documents you want to analyze, then click Add.

Description of the illustration oci_du_files11.png
In the Data Flow editor, click Add a step (+).
From the Data Flow Steps pane, double-click Apply AI Model, and then select the model to use.

Description of the illustration oci_du_files14.png

For example, you might select Pretrained Document Classification to identify passports.
In Apply AI Model, go to the Inputs section, and configure the Input Column and Input Type parameters.
- If you're referencing your source documents by bucket, in Input Column select URL, and in Input Type select Buckets.
  
  Description of the illustration vision_parameters.png
- If you're referencing your source documents individually, in Input Column select File Location, and in Input Type select Documents.
See Parameter Options for OCI Document Understanding Models.
In the data flow editor, click Add a step (+) and select Save Data.
In Name, enter a name for the output dataset.
For example, you might call the dataset 'Passport Identification Analysis Results'.
In the Save data to field, specify the location for output dataset.
Click Save, enter a name for the data flow, and click OK.
Click Run Data Flow.

When the data flow completes the analysis, open the dataset that you specified in Step 7.

To locate the generated dataset, from the Oracle Analytics home page, navigate to Data, then Datasets.
Description of oci_du_files13.png follows
Description of the illustration oci_du_files13.png

For more detail about the generated results, see Output Data Generated for OCI Document Understanding Models.