Query External Tables with Implicit Partitioning

On Autonomous AI Database you can create implicit partitioned external tables from Hive style partitioned data or from simple folder partitioned data stored on your Cloud Object Store.

About External Tables with Implicit Partitioning

On Autonomous AI Database use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create implicit partitioned external tables from Hive style partitioned data or from simple folder partitioned data stored on your Cloud Object Store.

By passing the appropriate options to this procedure, the partitions are derived from the source data. The partitioned external tables support runtime discovery of partition columns and their values. Runtime discovery of changes in the underlying object store structure, such as adding or removing objects, simplifies the maintenance process by eliminating the need for additional synchronization procedures required by DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE. This ensures that the data is up to date at query runtime.

With implicit partitioning, Autonomous AI Database automatically determines the columns a table is “partitioned on” based on the object storage source hierarchical file structure. There is no need to explicitly declare a partitioning scheme. Implicit partitioning provides partition-table-like performance benefits without the need to explicitly define a partitioned external table using the DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE procedure.

In order to optimize query response time for queries of very large data sets using implicit partitioning, you can enable optimization by setting strict_column_order to true. The strict_column_order parameter is a sub-option of the implicit_partition_config format option. This optimization only applies when partition_type is hive. See Query External Implicit Partitioned Data with Hive Format Source File Organization for an example and DBMS_CLOUD Package Format Options for more information on the implicit partitioning format parameters.

Implicit partitioned external tables support the following naming styles for partitioned objects in object store:

Hive Naming Format: Data in object storage that has been generated from Hive has the following format.
```
OBJBUCKET/<table>/<partcol1>=<value1>/<partcol2>=<value2>/file.parquet
```
For an example, See Query External Implicit Partitioned Data with Hive Format Source File Organization
Non-Hive “Pathtail” Naming Format: A second format that is often found in data lakes is similar to the hive format, but the folder name does not include the partition columns.
```
OBJBUCKET/<table>/<value1>/<value2>/file.parquet
```
For an example, See Query External Implicit Partitioned Non-Hive Style Data.

For more information on naming formats, see About External Tables with Source File Partitioning.

Using the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure, implicit partitioned external tables can be created in the following ways:

Optimize implicit partitioning. Use the implicit_partition_config option to enable implicit partitioning and optimization of elapsed query time.

In this example, implicit partitioning is enabled by setting partition_type to hive. The only valid value for partition_type is hive. Optimization is enabled by setting setting strict_column_order to true. The partition_columns option specifies which columns are partitioned.

For example:

BEGIN
dbms_cloud.create_external_table (
   table_name        => 'partitions1',
   file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
   column_list       => 'c varchar2(100), y number, total_sales number',
   format            => '{"type":"parquet",
                          "implicit_partition_config":{
                                    "partition_type":"hive",
                                    "strict_column_order":true,
                                    "partition_columns":["org","tenant","dbtype","year","month","day"]
                          }
                          }');
END;
/

Set the partition type to hive with a given list of partition columns

In this case, implicit_partition_type is set to hive and implicit_partition_columns provides a list of partition columns.

For example:

BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
   table_name        => 'mysales',
   credential_name   => 'MY_CREDENTIAL',
   file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
   column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
   format            => '{"type":"parquet",
                          "implicit_partition_type":"hive",
                          "implicit_partition_columns":["country","year","month"]}');
END;
/

Set the partition type to hive without providing a list of partition columns

In this case, implicit_partition_type is set to hive and implicit_partition_columns is not provided. The partition columns are automatically detected by searching for '=' in the path specified by file_uri_list. The column_name is to the left of the '=' and the value is on the right side. If a column_name is not found in the path, an error is thrown.

For example:

BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
   table_name        => 'mysales',
   credential_name   => 'MY_CREDENTIAL',
   file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
   column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
   format            => '{"type":"parquet", "implicit_partition_type":"hive"');
END;
/

Provide a list of partition columns without specifying the type

In this case, implicit_partition_type is not set and implicit_partition_columns provides a list of columns.

For example:

BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
   table_name        => 'mysales',
   credential_name   => 'MY_CREDENTIAL',
   file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
   column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
   format            => '{"type":"parquet",
                          "implicit_partition_columns":["country","year","month"]}');
END;
/

See Query External Implicit Partitioned Data with Hive Format Source File Organization and CREATE_EXTERNAL_TABLE Procedure for further information.

Query External Implicit Partitioned Data with Hive Format Source File Organization

Use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create an implicit partitioned external table from data in Object Store that was generated from Hive data.

The sample source files in the follow example, use this naming format:

OBJBUCKET/<table>/<partcol1>=<value1>/<partcol2>=<value2>/file.parquet

Consider the following sample source files:

OBJBUCKET/sales/country=USA/year=2024/month=01/sales-2024-01.parquet

OBJBUCKET/sales/country=USA/year=2024/month=02/sales-2024-02.parquet

OBJBUCKET/sales/country=USA/year=2024/month=02/sales-2024-03.parquet

To create an implicit partitioned external table with data stored in this sample Hive format, do the following:

Store Object Store credentials using the procedure DBMS_CLOUD.CREATE_CREDENTIAL.

For example:
```
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'MY_CREDENTIAL',
    username => 'adb_user@example.com',
    password => 'password' );
END;
/
```
Creating a credential to access Oracle Cloud Infrastructure Object Store is not required if you enable resource principal credentials. See Use Resource Principal to Access Oracle Cloud Infrastructure Resources for more information.

This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for creating external tables.

See CREATE_CREDENTIAL Procedure for information about the username and password parameters for different object storage services.
Create an implicit partitioned external table on top of your source files using the procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE.
1. Case 1: Enable implicit partitioning.
  
  In this example, since the format option implicit_partition_columns is not provided, partition columns are automatically detected and implicit partitioning is enabled by setting the format option implicit_partition_type to hive.
  
  However, if a list of columns is given, then those columns are used as partition columns and Autonomous AI Database doesn't try to discover the columns.
  
  To detect the partition columns Autonomous AI Database starts searching from the beginning of the path, specified by file_uri_list, for '='. When found, the left part of '=' up to the last '/' is taken as column (e.g. “country”) and the right part of '=', until the first '/' as value (e.g. “USA”). The search continues for '=' until after the '/' that follows the first partition value, detecting the second '=' in the path, and so on.
```
BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
   table_name        => 'mysales',
   credential_name   => 'MY_CREDENTIAL',
   file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
   column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
   format            => '{"type":"parquet", "implicit_partition_type":"hive"');
END;
/
```
  The parameters are:
  - table_name: is the external table name.
  - credential_name: is the name of the credential created in the previous step.
  - file_uri_list: is a comma-delimited list of source file URIs. There are two options for this list:
    - Specify a comma-delimited list of individual file URIs without wildcarding.
    - Specify a single file URI with wildcards, where the wildcards can only be after the last slash “/”. The character “*” can be used as the wildcard for multiple characters, the character “?” can be used as the wildcard for a single character.
    In this example, namespace-string is the Oracle Cloud Infrastructure object storage namespace and bucketname is the bucket name. See Understanding Object Storage Namespaces for more information.
  - column_list: is a comma delimited list of column names and data types for the external table. The list includes the columns inside the data file and those derived from the object name (from names in the file path specified by file_uri_list).
    
    The column_list is not required when the data files are structured files (Parquet, Avro, or ORC).
  - format: defines the options you can specify to describe the format of the source file. The implicit_partition_type option specifies the data format type as hive.
    
    If the data in your source file is encrypted, decrypt the data by specifying the encryption format option. See Decrypt Data While Importing from Object Storage for more information on decrypting data.
    
    See DBMS_CLOUD Package Format Options for more information.
2. Case 2: Optimized implicit partitioning
  
  In this example, implicit partitioning is enabled by specifying the implicit_partition_config format option, and optimized implicit partitioning is enabled by setting strict_column_order to true.
  
  Since partition_columns specifies a list of partitioned columns, those columns are used as partition columns and Autonomous AI Database doesn't try to discover the columns.
```
BEGIN
dbms_cloud.create_external_table (
   table_name        => 'mysales',
   credential_name   => 'MY_CREDENTIAL',
   file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
   column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
   format            => '{"type":"parquet",
                          "implicit_partition_config":{
                                    "partition_type":"hive",
                                    "strict_column_order":true,
                                    "partition_columns":["country","year","month","day"]
                          }
                          }');
END;
/
```
  The format parameters are:
  - format: defines the options you can specify to describe the format of the source file. The implicit_partition_config option has the following sub-options:
    - partition_type
      
      The only accepted value is "hive". The default is non-hive filename path. If partition_type is not specified, a client error is returned.
    - partition_columns
      
      List of partitioned columns.
    - strict_column_order
      
      Set to true to enable optimized implicit partitioning. Set to true only for cases where you know the Object Store path will not change and there are no missing prefixes. Set to false or omit to disable.
    If the data in your source file is encrypted, decrypt the data by specifying the encryption format option. See Decrypt Data While Importing from Object Storage for more information on decrypting data.
    
    See DBMS_CLOUD Package Format Options for more information.
See CREATE_EXTERNAL_TABLE Procedure for detailed information about the parameters. See DBMS_CLOUD URI Formats for more information on the supported cloud object storage services.
You can now run queries on the external partitioned table you created in the previous step.

Your Autonomous AI Database takes advantage of the partitioning information of your external partitioned table, ensuring that the query only accesses the relevant data files in Object Store.

For example:
```
SELECT product, units FROM mysales WHERE year='2024' and month='02'
```
This SQL statement only queries data in the partition for month 02 of year 2024.

Optimize Object Store Query Planning with the strict_column_order option

When a query targets an object-store folder with a very large number of files and subfolders, the planning and listing phase can become the primary cost before any data is actually scanned. This is common with Hive-style folder layouts where partition values are embedded in the path.

Note: - The strict_column_order setting is disabled by default. Enable it only if the path layout is consistent.

If folder conventions change (such as adding, removing, or reordering partition columns), you must update the partition_columns option and may need to disable this option.

Enable the strict_column_order option by the following:

If your data uses Hive-style partitioned paths and the partition columns always appear in a fixed, consistent order with no missing segments, enable the optimization by the following setting:

strict_column_order = true  (with partition_type = "hive")

The database can skip unnecessary directories by following the order of partitions that are already defined. This means it does not have to list every single object, which can significantly speed up the planning time for large datasets.

Notes for using the strict_column_order option:

Paths follow Hive naming and order, for example:
```
.../country=US/year=2025/month=09/...
```
The partition columns maintain a fixed set and sequence, with no reordering or skipped prefixes.
You need to optimize plan-time listing for folders containing a very large number of objects.
You cannot use this option in datasets where not all prefixes are present. For example, some folders include year=, while others begin with month=).

Examples: Use strict_column_order option for Hive-Style Partitions with DBMS_CLOUD.CREATE_EXTERNAL_TABLE

BEGIN
  DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
    table_name        => 'sales_xt',
    credential_name   => 'MY_CREDENTIAL',
    file_uri_list     => 'https://objectstorage.example.com/n/tenant/b/lake/o/sales/',
    -- Data is stored as: .../country=US/year=2025/month=09/*.parquet
    column_list       => 'product       VARCHAR2(100),
                          units         NUMBER,
                          amount        NUMBER,
                          country       VARCHAR2(30),
                          year          NUMBER,
                          month         NUMBER',
    format            => '{
      "type": "parquet",
      "implicit_partition_config": {
        "partition_type": "hive",
        "partition_columns": ["country","year","month"],
        "strict_column_order": true
      }
    }'
  );
END;
/

Verify that the planning logic avoids listing unrelated top-level paths (e.g., paths belonging to other countries or years):

Run a selective query that constrains leading partitions, for example:

SELECT COUNT(*) FROM sales_xt WHERE country = ''US'' AND year = 2025;

with

strict_column_order = true

If the layout assumptions are not met, disable the option and retry.

Query External Implicit Partitioned Non-Hive Style Data

Use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create an implicit partitioned external table from data in object storage that has been generated from non-Hive data.

The sample source files in the follow example, use this naming format:

OBJBUCKET/<table>/<value1>/<value2>/file.parquet

Consider the following sample source files:

OBJBUCKET/sales/USA/2024/01/sales-2024-01.parquet

OBJBUCKET/sales/USA/2024/01/sales-2024-02.parquet

OBJBUCKET/sales/USA/2024/01/sales-2024-03.parquet

To create an implicit partitioned external table with data stored in this sample Hive format, do the following:

Store Object Store credentials using the procedure DBMS_CLOUD.CREATE_CREDENTIAL.

For example:
```
BEGIN
  DBMS_CLOUD.CREATE_CREDENTIAL (
    credential_name => 'MY_CREDENTIAL',
    username => 'adb_user@example.com',
    password => 'password' );
END;
/
```
Creating a credential to access Oracle Cloud Infrastructure Object Store is not required if you enable resource principal credentials. See Use Resource Principal to Access Oracle Cloud Infrastructure Resources for more information.

This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for creating external tables.

See CREATE_CREDENTIAL Procedure for information about the username and password parameters for different object storage services.
Create an implicit partitioned external table on top of your source files using the procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE.

In this example, implicit partitioning is enabled by providing the partition columns with the format option implicit_partition_columns. Since the folder name does not include the partition columns, the partition values in the path, specified by file_uri_list, can only be discovered if an explicit list of columns is given with the format option implicit_partition_columns. In order to detect the respective column values in the path the order of columns must be the same as the order of values.

The partition columns are automatically detected.
```
BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
   table_name        => 'mysales',
   credential_name   => 'MY_CREDENTIAL',
   file_uri_list     => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
   column_list       => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
   format            => '{"type":"parquet", "implicit_partition_columns":["country","year","month"]}');
END;
/
```
The parameters are:
- table_name: is the external table name.
- credential_name: is the name of the credential created in the previous step.
- file_uri_list: is a comma-delimited list of source file URIs. There are two options for this list:
  - Specify a comma-delimited list of individual file URIs without wildcarding.
  - Specify a single file URI with wildcards, where the wildcards can only be after the last slash “/”. The character “*” can be used as the wildcard for multiple characters, the character “?” can be used as the wildcard for a single character.
- column_list: is a comma delimited list of column names and data types for the external table. The list includes the columns inside the data file and those derived from the object name (from names in the file path specified by file_uri_list).
  
  The column_list is not required when the data files are structured files (Parquet, Avro, or ORC).
- format: defines the options you can specify to describe the format of the source file. The implicit_partition_type option is unset. Since implicit_partition_columns is given, the type is automatically detected as non-hive.
  
  If the data in your source file is encrypted, decrypt the data by specifying the encryption format option. See Decrypt Data While Importing from Object Storage for more information on decrypting data.
  
  See DBMS_CLOUD Package Format Options for more information.
In this example, namespace-string is the Oracle Cloud Infrastructure object storage namespace and bucketname is the bucket name. See Understanding Object Storage Namespaces for more information.

See CREATE_EXTERNAL_TABLE Procedure for detailed information about the parameters.

See DBMS_CLOUD URI Formats for more information on the supported cloud object storage services.
You can now run queries on the external partitioned table you created in the previous step.

Your Autonomous AI Database takes advantage of the partitioning information of your external partitioned table, ensuring that the query only accesses the relevant data files in Object Store.

For example:
```
SELECT product, units FROM mysales WHERE year='2024'
```
This SQL statement only queries data in the partition for year 2024.