Prepare Documents to Analyze with an OCI Document Understanding Model
You use buckets in OCI Object Storage to store the documents that you want to analyze, then create a dataset to access these documents in Oracle Analytics.
You typically store input documents and AI models in the same Oracle Cloud account (tenancy), which makes it easier to setup in Oracle Analytics.
If your input documents and AI models are stored in different tenancies:
- Make sure that the visibility of the storage bucket containing your input documents is public. See Change the visibility of a bucket.
- Populate the input dataset for the data flow with individual document URLs instead of a single URL for the OCI bucket where documents are stored.
In a single run, Oracle Analytics data flows can process up to 10,000 documents for pretrained models and 2,000 documents for custom models. If you have more than the maximum number documents that can be processed in one run, in OCI's Object Storage & Archive Storage, create multiple buckets containing no more than the maximum number of documents in each one. Then, create a separate dataset and data flow for each bucket, and use a sequence to sequentially process the data flows.
You can use a private or public bucket that is accessible by the OCI user and that complies with OCI's generic limits on documents. See OCI documentation.