Overview to Document Classification and Key Value Extraction
In Oracle Cloud Infrastructure (OCI), Document Understanding provides pretrained AI models that can extract text, tables, and other key data from document files. You perform document classification or key value extraction on a document, then use that extracted data as part of your analysis in Oracle Analytics.
Document Understanding also lets you create custom models for key value extraction and document classification.
In Oracle Analytics, you use data flows to apply the Document Understanding AI models to your data.
- Pretrained Models Supported in Oracle Analytics
- Document Classification
- Key Value Extraction (for receipts, invoices, driver IDs, and passports)
- Custom Models Supported in Oracle Analytics
- Custom Document Classification
- Custom Key Value Extraction
You must set up and build custom models in OCI Console before you can use them in Oracle Analytics. First, you use OCI Data Labeling to create a good dataset that you can use to train the model and then you build your custom model. See OCI Document Understanding - Custom Models.
Example Output From a Document Classification Model
In this example, a data flow applies a pretrained document classification model to documents in JPG format to predict whether they are receipts, and outputs the analysis results to a dataset. The dataset includes a RECEIPT value for "Document Type", and a "Confidence" prediction level for each document.
Description of the illustration oci_du_files13.png
Before you start:
- Ask your administrator to integrate your Oracle Analytics service with OCI Document Understanding. See Integrate Oracle Analytics with Oracle Cloud Infrastructure Document Understanding.
- In Oracle Analytics, create a connection to your OCI Document Understanding service. See Create a Connection to Your Oracle Cloud Infrastructure Tenancy.