RAG Tool Object Storage Guidelines for Generative AI Agents

Review the following sections to prepare Object Storage data for RAG tools in Generative AI Agents.

General Guidelines

Follow these guidelines to prepare data for Generative AI Agents data sources before uploading to Object Storage:

Data Sources: Data for Generative AI Agents must be uploaded as files to an Object Storage bucket.
Number of Buckets: Only one bucket is allowed per data source.
Supported File Types: PDF, txt, JSON, HTML, and Markdown (MD) files are supported.
File Size Limit: Each file must be no larger than 100 MB. Any files that exceed the limit are ignored. For other requirements, see File Type Requirements and Support.
URLs: All the hyperlinks present in the documents are extracted and displayed as hyperlinks in the chat response.
Data Not Ready: If your data isn't yet available, create an empty folder for the data source and populate it later. This way, you can ingest data into the source after the folder is populated.

Note

Set up the following Object Storage permissions before you proceed.

User access to Object Storage files
Data ingestion job access to Object Storage files for long-running jobs

See Getting Access for the permissions.

File Type Requirements and Support

Data source files must be uploaded to Object Storage. Ensure that the requirements are met for the type of file to be ingested.

PDF
txt
JSON
HTML
MD (Markdown)

PDF

The requirements and support for ingestion of PDF files are as follows:

File extension: Must be .pdf
File size: A single file must not exceed 100 MB.
File password: If a PDF file is password-protected, a file failure is recorded in the status logs.
Contents: A PDF file can include images, charts, and reference tables but these must not exceed 8 MB.
Chart preparation: No special preparation is needed for charts, as long as they're two-dimensional with labeled axes. The model can answer questions about the charts without explicit explanations.
Table preparation: Use reference tables with several rows and columns. For example, the agent can read the table on the limits page.

txt

The requirements and support for ingestion of txt files are as follows:

File extension: Must be .txt
File size: A single file must not exceed 100 MB.

JSON

The requirements and support for ingestion of JSON files are as follows:

File extension: Must be .json
File size: A single file must not exceed 100 MB.
Encoding: Only UTF-8 encoding in English is supported. The JSON structured data can contain key-value pairs, arrays, and nested objects.
Depth of nesting: The depth of structure must not exceed 50.
List limit: A list inside the JSON structure must not be longer than 10000 items.

HTML

The requirements and support for ingestion of HTML files are as follows:

File extension: Must be .html
File size: A single file must not exceed 100 MB.
Contents: Only visible content is ingested. Any dynamic content is not ingested and script tags are stripped.
Images: Images that are referenced in a file can be processed if the image source is not an external HTTP or an absolute path. Any images that don't meet the following requirements are ignored.
- Only JPEG images (.jpg or .jpeg) are supported.
- A single image must not exceed 6 MB. Any images exceeding the limit are ignored.
- Images must be uploaded to Object Storage at the same level as the uploaded HTML file or below it.
- The source path (src attribute) to each image must be a path relative to the parent HTML file. For example:
```
<img src="./my-image.jpg">
<img src="./myfolder/my-imagetwo.jpg">
```
- The source path (src attribute) to each image must not specify URLs (http, https, or data)

MD (Markdown)

The requirements and support for ingestion of MD (Markdown) files are as follows:

File extension: Must be .md
File size: A single file must not exceed 100 MB.
Images: Images are ignored and not processed.

Ensuring Enhanced Table Understanding

Enhanced table understanding, a feature of RAG tools, aims to enhance the accuracy of responses to queries with answers embedded in PDF table data. It processes these tables to generate more precise and relevant responses aligned with the information they contain. In general, the RAG tools can read the tables. For the RAG tool to read the tables with enhanced table understanding, ensure that the tables have the following features:

All cells of the table are separated with visible lines or object boundaries from other cells, including the header names in the first row.
All columns including the first column have a header name.
Each table has more than one column and more than one row, excluding the row with header names.

Tables that are ingested with enhanced table understanding are listed when you ingest the data. Example message:

Count of tables that support enhanced table understanding in following PDFs:
      - enhanced_table_test_data/2025_Report1.pdf has 4 tables processed successfully
      - enhanced_table_test_data/2025_Report2.pdf has 3 tables processed successfully
      - enhanced_table_test_data/2025_Report3.pdf has 3 tables processed successfully

Enhancing Responses with Metadata Filtering

Use predefined metadata to apply filters during a chat. When filters are applied, an agent's searches in a chat session are limited to data files that are associated with the metadata, helping the model generate answers relevant to the content scope, thus enhancing the agent's response accuracy and relevance.

The following steps describe an overview of how to use the metadata filtering feature. After you understand the workflow overview, review the details for your use case in the sections provided after the overview steps.

In a text editor, create the metadata schema, which is required for the filters that you want to be made available. Write the schema in JSON format. Name the file _metadata_schema.json.

Example:

{
    "metadataSchema": [
        {
            "name": "publication_year",
            "type": "integer"
        },
        {
            "name": "title",
            "type": "string"
        }
	]
}

Upload the _metadata_schema.json file created in step 1 to the root level of the Object Storage bucket that contains the data files for a knowledge base.
Create JSON files to associate data files with the predefined metadata and provide the metadata values.
Example:
```
{
    "metadataAttributes": {
        "publication_year": 2020
    }
} 
```
You can associate one or more data files or all files in a bucket with the metadata. For details about the JSON file name conventions to use for the options you choose, see Metadata Filter Options (File Name and Location).
Upload the JSON files created in step 3 to the Object Storage bucket that contains the data files for a knowledge base. For each option, ensure that you save the file in the correct location in the hierarchy.
Create a knowledge base. Select Object Storage as the data store type, and the option to automatically start the ingestion job.
When the data files are ingested, Generative AI Agents creates a list of the metadata names and the values that can be selected in a chat. To view the ingested metadata names and values, see Getting a Knowledge Base's Details in Generative AI Agents.
Create an agent with a RAG tool, selecting the knowledge base created in step 5. In the agent, select the option to automatically create an endpoint. If you need help, see Creating an Agent and Creating a RAG tool.
In a chat window, add one or more predefined metadata filters and select the values to apply. See Use Metadata Filters in a Chat.

Note

Review the following sections to learn more about preparing metadata JSON files for your use case and how to add and apply metadata filters in a chat session.

Metadata Filter Options (File Name and Location)

Select one or more of the following methods that works best for you.


Method	File Name and Location	Usage
Include metadata for all the files in a bucket without mentioning the file names.	Create a `_common.metadata.json` file at the Object Storage root level.	Use this file for metadata that's common to all files in the bucket. This method helps avoid entering metadata duplicates across objects.
In one file create a metadata entry for each file in a bucket and include the file names.	Create an `_all.metadata.json` file at the Object Storage root level.	Use this method if you have a lot of files and creating one file that includes all the file names is more convenient for you than creating one metadata file per file.
Create a metadata file for each file in a bucket.	Create a `<file-name>.metadata.json` file for each file, at the file level. `<file-name>` must match the name of the data file in the bucket.	Use this method when metadata differs for each file and there aren't many files to create a metadata file for, or if you're automating the creation of the metadata files.
Add Object Storage metadata headers to each file.	Add metadata header through each file's Object Storage metadata property.	Use this method, if you have few metadata properties to include. We recommend you use the other methods with JSON files, because files are easier to update and manage and metadata headers are difficult to update.

For all methods, you must define a metadata schema file called _metadata_schema.json at the Object Storage bucket root level.

Here's an example hierarchy of where you save the metadata files that you need.

An image that shows hierarchy for metadata files in Object Storage. The bucket_root has the following files: _all.metadata.json, _common.metadata.json, _metadata_schema.json, file_0.pdf, file_0.pdf.metadata.json, folder_1, and folder_2. Then, folder_1 includes file_1.pdf and file_1.pdf.metadata.json and folder_2 includes file_2.pdf and file_2.pdf.metadata.json.

Metadata JSON File Examples

The following steps use examples to show how to format the metadata JSON files. See also Limits for Metadata Filtering.

Create a metadata schema file called _metadata_schema.json and save it at the Object Storage root level. For example:

{
    "metadataSchema": [
        {
            "name": "field_1",
            "type": "integer"
        },
        {
            "name": "field_2",
            "type": "string"
        },
        {
            "name": "field_3",
            "type": "list_of_string"
        },
        {
            "name": "field_4",
            "type": "double"
        }  
    ]
}

Each metadata filter has a name and a type.

The values allowed for type are integer, string, list_of_string, and double.

Example:

{
    "metadataSchema": [
        {
            "name": "rating",
            "type": "double"
        },
        {
            "name": "publication_year",
            "type": "integer"
        },
        {
            "name": "title",
            "type": "string"
        },
        {
            "name": "topic",
            "type": "list_of_string"
        }
    ]
}

Sample index mapping for OpenSearch, integer:

"publication_year": {
  "type": "integer"         
}

Sample index mapping for OpenSearch, string:

"title": {
  "type": "text",
  "fields": {
      "keyword": {
          "type": "keyword"
      },
      "search_as_you_type": {
          "type": "search_as_you_type"
      }
   }
}

Sample index mapping for OpenSearch, list_of_string:

"publishers": {
    "type": "text",
    "fields": {
    "keyword": {
        "type": "keyword"
    },
    "search_as_you_type": {
        "type": "search_as_you_type"
    }
  }
}

Sample index mapping for OpenSearch, double:

"rating": {
  "type": "double"
}

(Optional) Create a JSON file called _commmon.metadata.json for metadata that's common to all files in a bucket. For example:

{
    "metadataAttributes": {
        "field_1": value_1,
        "field_2": value_2,
        "field_3": value_3,
        ......,
        "field_n": value_n
    }
}

Example:

{
    "metadataAttributes": {
        "rating": 3.3,
        "publication_year": "2020",
        "topic": [
            "cooking",
            "health",
            "gardening"
        ]
    }
}

(Optional) Create a JSON file called _all.metadata.json. In this one file, add metadata for each file by name. For example:

{
    "folder_1/file_1.pdf" : {
        "metadataAttributes": {
            "field_1": value_1,
            "field_2": value_2,
            "field_3": value_3,
            ......,
            "field_n": value_n
        }
    },
    "folder_2/file_2.pdf": {
        "metadataAttributes": {
            "field_1": value_1,
            "field_2": value_2,
            "field_3": value_3,
            ......,
            "field_n": value_n
        }
    }
}

(Optional) Add metadata separately for files in the bucket by creating a JSON file called <file-name>.metadata.json for a file.
<file-name> must match the name of the data file in the bucket. For example, the data file file1.pdf is associated with metadata defined in the JSON file file1.pdf.metadata.json.
```
{
    "metadataAttributes": {
        "field_1": value_1,
        "field_2": value_2,
        "field_3": value_3
    }
}
```

Note

You can't change or remove the metadata fields after the knowledge base data is ingested. You can add new fields up the allowed limit. To remove or update a field, recreate the knowledge base.

Use Metadata Filters in a Chat

The following procedure assumes that you have created the required metadata schema and optional metadata filter JSON files, a knowledge base, and an agent with a RAG tool and an endpoint.

Start a chat.
In the panel on the right, expand Metadata filters and select Add filter to add one or more filters.
1. In the Add filter panel, select Add filter and use the menus to select a predefined metadata filter by name, a condition operator, and a metadata value.
  
  For example, you could add the filter topic contains cooking where topic is the metadata filter name, contains is the condition operator, and cooking is the metadata value.
2. Select Add filter to add more filters as needed.
  
  To remove a filter, select the X icon that's at the end of the row.
3. When done, select Add.
  
  The added filters appear under Metadata filters in the chat window. A checkmark next to a filter means that the filter is applied. If you don't want to apply a filter, clear the checkbox next to the filter.
Start a conversation by typing a message and selecting Submit.

During a chat session, you can add or remove metadata filters, and clear or apply filters to continue the conversation.

Limits for Metadata Filtering


Description	Limit
Maximum number of entries in `_all.metadata.json`	10,000
Maximum number of metadata fields that can be specified for each file	20
Maximum number of items in a `list_of_string type`	10
Maximum length of individual item in a `list_of_string` type	50
Maximum length of a metadata key in characters	25
Maximum length of metadata value in characters	50

Adding Metadata to an Object Storage Metadata Header

Create an Object Storage bucket and upload source files for RAG responses in OCI Generative AI Agents. Optionally, add a custom URL to each file for citation.

In the navigation bar of the Console, select a region that hosts Generative AI Agents, for example, US Midwest (Chicago). If you don't know which region to select, see Regions with Generative AI Agents.
Open the navigation menu and select Storage. Under Object Storage & Archive Storage, select Buckets.
Select the compartment in which you want to create a bucket or the compartment that contains the bucket that you want to use. You must already have the following permission to add Object Storage resources to this compartment.
```
allow group <your-group-name> to manage object-family in compartment <compartment-with-bucket>
```
To create a bucket follow these steps:
1. Select Create Bucket.
2. Enter a name unique to your region for the bucket.
3. For other fields, select the Learn more links and then select options that apply to your data. Also see Creating an Object Storage Bucket.
4. Select Create.
  By default, a new bucket is private. You can change the visibility of a bucket after you create it.
Select the name of the bucket that you want to use.
On the bucket details page, under Objects, select Upload.
(Optional) Select Show Optional Headers and Metadata and then select and enter the following values.
- Type: Metadata
- Name: gaas-metadata-filtering-field-<metadata-name>
- Value: <metadata-value>
Important

For the metadata filtering to work, you must use the prefix gaas-metadata-filtering-field- for the metadata Name.
Object Storage then prepends opc-meta- to the metadata name, so the header is displayed as opc-meta-gaas-metadata-filtering-field-<metadata-name>.

For example, to add a metadata with the name publication_year, add a metadata header with the name gaas-metadata-filtering-field-publication_year. When you get the details for this file, the metadata name displays as opc-meta-gaas-metadata-filtering-field-publication_year.

For list values, use the following format:

_LIST_OF_STRING_|list_value_1|list_value_2, where _LIST_OF_STRING_ is fixed, and each list item is separated by a pipe '|' character. This format is decoded as a list of values: {list_value_1, list_value_2}
Add one or more files for the data source and select Upload.
Note
- You can't update the metadata property of existing objects. Instead, you can copy a file, add a new metadata to that file, and then delete the old file.
- You can add filters to your chat conversation with an agent using the metadata filtering after the knowledge base data from Object Storage and its metadata are ingested. To learn about adding filters, see step 11 in Chatting with Agents in Generative AI Agents. You can also view details of metadata values after you ingest the data in a knowledge base. See the Metadata resource in Getting a Knowledge Base's Details in Generative AI Agents.

Adding Data with Custom URL to an Object Storage Bucket

Create an Object Storage bucket and upload source files for RAG responses in OCI Generative AI Agents. Optionally, add a custom URL to each file for citation.

In the navigation bar of the Console, select a region that hosts Generative AI Agents, for example, US Midwest (Chicago). If you don't know which region to select, see Regions with Generative AI Agents.
Open the navigation menu and select Storage. Under Object Storage & Archive Storage, select Buckets.
Select the compartment in which you want to create a bucket or the compartment that contains the bucket that you want to use. You must already have the following permission to add Object Storage resources to this compartment.
```
allow group <your-group-name> to manage object-family in compartment <compartment-with-bucket>
```
To create a bucket follow these steps:
1. Select Create Bucket.
2. Enter a name unique to your region for the bucket.
3. For other fields, select the Learn more links and then select options that apply to your data. Also see Creating an Object Storage Bucket.
4. Select Create.
  By default, a new bucket is private. You can change the visibility of a bucket after you create it.
Select the name of the bucket that you want to use.
On the bucket details page, under Objects, select Upload.
(Optional) Select Show Optional Headers and Metadata and then select and enter the following values.
- Type: Metadata
- Name: customized_url_source
- Value: <Custom-URL-for-the-file>
Important

For the citation link override to work, you must use Name: customized_url_source.
Add one or more files for the data source and select Upload.

Note

If you added the customized_url_source metadata to an object in step 7, this custom URL applies to all the files that you upload for this object. You can't update the metadata property of existing objects. Instead, you can copy a file, add a new metadata to that file, and then delete the old file. To add or update a file with the customized_url_source metadata, using OCI CLI, see Assigning a Custom URL to a Citation.

Note

Beta Customers:

If you created a knowledge base in the Beta phase, you might need to delete and re-create the data source for the URL handling feature to work.

Assigning a Custom URL to a Citation

When an agent uses the RAG for its responses, you can get citations. By default, the citations point to Object Storage where the files are stored. To reference a URL instead of the file that's being referenced, you can add a custom URL to the metadata object for that file.

This topic shows how to add or update the metadata object through OCI CLI.

Start OCI CLI in an environment or in Cloud Shell. We recommend that you try it in Cloud Shell first to become familiar with the commands.
See Get Started with the Command Line Interface.
Get the object name for the file that you want to add a custom URL to:
```
oci os object list --bucket-name <the-bucket-name> 
--file <the-file-name>
```
Example output:
```
"data": [
    {
      "archival-state": null,
      "etag": "xxx",
      "md5": "xxx==",
      "name": "<the-object-name>",
      "size": 1117630,
      "storage-tier": "Standard",
      "time-created": "2025-03-12T22:21:26.991000+00:00",
      "time-modified": "2025-03-12T22:38:10.217000+00:00"
    },
Other objects are listed similarly after this comma.
```
You can also find the object name in the Console. In the bucket details page, select the Actions menu (three dots) for the object, select View Object Details and copy the name.

Note

If a file is in a folder, then the file name and its object name differ. For example, for a file named file1.pdf, its object name could be folder1/file1.pdf. Otherwise, the file name and its object name are the same.
Download the file into the current working directory.
To add or update a file's metadata object, you replace the file with the same file that has a new metadata object. That's why you're copying the file into the current working directory first.
```
oci os object get 
--bucket-name <the-bucket-name> 
--file <the-file-name>
--name <the-object-name>
```
Find the metadata object values for the current file.
```
oci os object head 
--bucket-name <the-bucket-name> 
--name <the-object-name>
```
Example output:
```
{
 some data

  "opc-client-request-id": "xxx",
  "opc-meta-key1": "value1",
  "opc-meta-key2": "value2",
  "opc-request-id": "xxx",
 ...
}
```
This example shows that the metadata object value is '{"key1":"value1","key2":"value2"}'. The metadata name is saved with a prefix of opc-meta-, but you don't have to add this prefix when you add the metadata name in the next steps. This prefix is added automatically to each metadata name.

Replace the file that's in Object Storage with the same file that's in the current working directory, and add a new metadata object.

To keep the current metadata and add the custom URL name and values, '{"customized_url_source":"<the-custom-url>" to the metadata object:

oci os object put 
--bucket-name <the-bucket-name> 
--file <the-file-name> 
--name <the-object-name>
--force --metadata 
'{"customized_url_source":"<the-custom-url>",
"<existing-metadata-name-1>":"<existing-metadata-value-1>"
"<existing-metadata-name-2>":"<existing-metadata-value-2>"}'

For example, to keep the metadata names and values displayed in the step 4 example:

oci os object put 
--bucket-name <the-bucket-name> 
--file <the-file-name> 
--name <the-object-name>
--force --metadata 
'{"customized_url_source":"<the-custom-url>",
"key1":"value1",
"key2":"value2"}'

To replace the existing metadata object to only include the custom URL run the following command

oci os object put 
--bucket-name <the-bucket-name> 
--file <the-file-name> 
--name <the-object-name>
--force --metadata '{"customized_url_source":"<the-custom-url>"}'

Ensure that the metadata object for the custom URL is replaced.

oci os object head 
--bucket-name <the-bucket-name> 
--name <the-object-name>

Example output:

{
 some data

  "opc-meta-customized_url_source": "some-new-link",
 ...
}

Important

The metadata object that overrides the default citation must have the name, customized_url_source.
You can have one metadata object with the name, customized_url_source
Each customized_url_source can have only one URL.
The commands in step 5 works for both adding and updating the metadata object, because they replace the current metadata object's value.
Ensure that you pass the values for the --metadata object with the format shown in the commands in step 5.

Links

Oracle Cloud Infrastructure Documentation

RAG Tool Object Storage Guidelines for Generative AI Agents

General Guidelines

File Type Requirements and Support

PDF

txt

JSON

HTML

MD (Markdown)

Ensuring Enhanced Table Understanding

Enhancing Responses with Metadata Filtering

Adding Metadata to an Object Storage Metadata Header

Adding Data with Custom URL to an Object Storage Bucket

Assigning a Custom URL to a Citation