Amazon S3 Data Source

Connect an Amazon S3 bucket to Agent Factory so Knowledge Agents can retrieve and ground answers in bucket content.

Amazon S3 data sources let Agent Factory ingest documents stored in S3 buckets. After you create the data source, Agent Factory crawls, parses, chunks, embeds, and indexes the selected bucket content.

Configure an Amazon S3 Data Source

  1. Select Data Sources in the navigation menu.

  2. Select Create new data source.

  3. Select Amazon S3 as the data source type.

  4. Enter a Data source name and Description.

  5. Enter the Bucket Name and AWS Region where the Amazon S3 bucket is hosted.

  6. Enter the Access Key and Secret Access Key for the bucket to read the bucket content you want to ingest.

    Important: Currently, Amazon S3 data sources support only long-lived access key credentials. Make sure the access key credentials have Amazon S3 permissions. The policy must allow s3:ListBucket and s3:GetObject on the required bucket.

  7. To limit the crawl scope and narrow the content to ingest, configure any optional Include Filters, Exclude Filters, and Exclude File Extensions.

  8. Set Crawl Depth and Crawl Frequency to control how much content Agent Factory crawls and how often it revisits the bucket.

  9. Enter a Proxy URL if access to Amazon S3 requires a network proxy.

    Amazon S3 Configuration

  10. Select Test Connection and confirm that Agent Factory can reach the bucket.

    Note: If the connection test fails, verify the bucket name, AWS Region, and AWS credentials. If Agent Factory cannot reach Amazon S3, verify the network path and proxy configuration.

  11. Select Create Data Source.

    Amazon S3 Dashboard

After the data source is created, monitor the ingestion status on the Data Sources page. When ingestion status is ingested, you can select the Amazon S3 data source while creating or editing a Knowledge Agent. See Create Knowledge Agent.

See Data Sources Troubleshooting for information about data source-related statuses and errors.