Prepare Documents to Analyze with an OCI Document Understanding Model

You use buckets in OCI Object Storage to store the documents that you want to analyze, then create a dataset to access these documents in Oracle Analytics.

You typically store input documents and AI models in the same Oracle Cloud account (tenancy), which makes it easier to setup in Oracle Analytics.

If your input documents and AI models are stored in different tenancies:

Make sure that the visibility of the storage bucket containing your input documents is public. See Change the visibility of a bucket.
Populate the input dataset for the data flow with individual document URLs instead of a single URL for the OCI bucket where documents are stored.

In a single run, Oracle Analytics data flows can process up to 10,000 documents for pretrained models and 2,000 documents for custom models. If you have more than the maximum number documents that can be processed in one run, in OCI's Object Storage & Archive Storage, create multiple buckets containing no more than the maximum number of documents in each one. Then, create a separate dataset and data flow for each bucket, and use a sequence to sequentially process the data flows.

You can use a private or public bucket that is accessible by the OCI user and that complies with OCI's generic limits on documents. See OCI documentation.

In OCI Console, navigate to Object Storage & Archive Storage, and create a bucket to store your documents.

Description of the illustration vision_bucket_nav.png
In the Object Storage & Archive Storage area, click a bucket name, then under the Objects region of the page click Upload and upload your documents.
Make sure that the bucket contains no extraneous files that you don't want to process. Oracle Analytics processes every file in the bucket.

Description of the illustration oci_du_files1.png
For each bucket, add the bucket URL to a comma-separated values (CSV) file.
1. In Object Storage, select the bucket to display the documents in the Objects dialog.
2. Copy the URL from the browser's URL bar.
3. Create a CSV file with fields for ID, Bucket Name, and Bucket URL.
4. Paste the bucket URL into the CSV file as the Bucket URL value.
  
  Description of the illustration oci_du_files4.png
  
  Alternatively, if your input documents and AI models are stored in different tenancies, add them individually to the CSV file.
  
  Create a CSV file with fields for ID, Document Name, and Document URL. For each document in Object Storage, click the ellipsis icon , and select View Object Details, and copy the Name value and URL Path (URI) value.
  
  Description of the illustration oci_du_files2.png
  
  Paste the Name value as Document Name, and paste the URL Path (URI) value as Document URL.
  
  Description of the illustration oci_du_files3.png
In Oracle Analytics, for each bucket that you're using to store your documents, click Create, then Dataset.
Upload the CSV file that you created in Step 3, and save the dataset.
Repeat steps 4 and 5 for each bucket. If you have more than 10,000 documents, create multiple buckets of up to 10,000 documents and create a separate dataset for each bucket.