17.4 ORACLE_BIGDATA Accessing Files

To use ORACLE_BIGDATA, you provide information in an access parameter to indicate how to access and parse the data.

To access the external object store, you define the file format type in the access parameter com.oracle.bigdata.fileformat, using one of the following values: csv, textfile, avro, parquet, jsondoc, or orc:
com.oracle.bigdata.fileformat=[csv|textfile|avro|parquet|orc|jsondoc]

You can also use ORACLE_BIGDATA to access local files for testing or simple querying. In this case, the LOCATION field value is the same as what you would use for ORACLE_LOADER. You can use an Oracle directory object followed by the name of the file in the LOCATION field. For local files, a credential object is not required. However, you must have privileges over on the directory object in order to access the file. For a list of all files, see:

ORACLE_BIGDATA Access Parameters

17.4.1 Syntax Rules for Specifying Properties

The properties are set using keyword-value pairs in the SQL CREATE TABLE ACCESS PARAMETERS clause and in the configuration files.

The syntax must obey these rules:

  • The format of each keyword-value pair is a keyword, which can be a colon or equal sign, and a value. The following are valid keyword-value pairs:

    keyword=value
    keyword:value

    The value is everything from the first non-whitespace character after the separator to the end of the line. Whitespace between the separator and the value is ignored. Trailing whitespace for the value is retained.

  • A property definition can span multiple lines. When this happens, precede the line terminators with a backslash (escape character), except on the last line. For example:

    Keyword1=  Value part 1 \
               Value part 2 \
               Value part 3
  • Special characters can be embedded in a property name or property value by preceding the character with a backslash (escape character). The following table describes the special characters:

Table 17-1 Special Characters in Properties

Escape Sequence Character

\b

Backspace (\u0008)

\t

Horizontal tab (\u0009)

\n

Line feed (\u000a)

\f

Form feed (\u000c)

\r

Carriage return (\u000d)

"

Double quote (\u0022)

'

Single quote (\u0027)

\

Backslash (\u005c)

When multiple backslashes are at the end of the line, the parser continues the value to the next line only for an odd number of backslashes.

\uxxxx

Unicode code point.

Note:

When multiple backslashes are at the end of a line, the parser continues the value to the next line only for an odd number of backslashes.

17.4.2 ORACLE_BIGDATA Common Access Parameters

There is a set of access parameters that are common to all file formats. Some parameters are unique to specific file formats.

Common Access Parameters

The following table lists parameters that are common to all file formats accessed through ORACLE_BIGDATA. The first column identifies each access parameter common to all data file types. The second column describes each parameter.

Note:

Parameters that are specific to a particular file format cannot be used for other file formats

Table 17-2 Common Access Parameters

Common Access Parameter Description

com.oracle.bigdata.fileformat

Specifies the format of the file. The value of this parameter identifies the reader that processes the file. Each reader can support additional access parameters that may or may not be supported by other readers.

Valid values: parquet, orc, textfile, avro, csv, jsondoc
  • parquet - file uses Parquet data file format
  • orc - file uses ORC columnar storage file format
  • textfile - file uses text file format
  • avro - file uses Avro file format
  • csv - file uses CSV text file format
  • jsondoc - reads a JSON file. The JSON values are mapped to a single JSON column that may be queried using SQL/JSON.

Default: parquet

com.oracle.bigdata.log.opt

Specifies whether log messages should be written to a log file. When none is specified, then no log file is created. If the value is normal, then log file is created when the file reader decides to write a message. It is up to the file reader to decide what is written to the log file.

Valid values: normal, none

Default: none.

com.oracle.bigdata.log.qc

Specifies the name of the log file created by the parallel query coordinator. This parameter is used only when com.oracle.bigdata.log.opt is set to normal. The valid values are the same as specified for com.oracle.bigdata.log.qc .

com.oracle.bigdata.log.exec

Specifies the name of the log file created during query execution. This value is used (and is required) only when com.oracle.bigdata.log.exec is set to normal. The valid values are the same as specified for in ORACLE_HIVE and ORACLE_HDFS.

Valid values: normal, none

Default: none.

com.oracle.bigdata.credential.name

Specifies the credential object to use when accessing data files in an object store.

This access parameter is required for object store access. It is not needed for access to files through a directory object or for data stored in public buckets.

The name specified for the credential must be the name of a credential object in the same schema as the owner of the table. Granting a user SELECT or READ access to this table means that credential will be used to access the table.

Use DBMS_CREDENTIAL.CREATE_CREDENTIAL in the DBMS_CREDENTIAL PL/SQL package to create the credential object. For example:

exec dbms_credential.create_credential(credential_name => 'MY_CRED',username =>'username', password => 'password');

In the CREATE TABLE statement, set the value of the credential parameter to the name of the credential object. For example:

com.oracle.bigdata.credential.name=MY_CRED

com.oracle.bigdata.credential.schema

Specifies the schema in which the credential object for accessing Object Stores is created. This parameter is used in the ACCESS PARAMETERS clause when creating external tables with the ORACLE_BIGDATA access driver.

17.4.3 ORACLE_BIGDATA Specific Access Parameters

Avro, Parquet, Textfile and CSV all have specific access parameters.

17.4.3.1 Avro-Specific Access Parameters

In addition to common access parameters, there are parameters that apply only to the Avro file format.

The first column in this table identifies the access parameters specific to the Avro file format and the second column describes the parameter. There is only one Avro-specific parameter at this time.

Note:

Parameters that are specific to a particular file format cannot be used for other file formats.

Table 17-3 Avro-Specific Access Parameters

Avro-Specific Parameter Description

com.oracle.bigdata.avro.decimaltpe

Specifies the representation of a decimal stored in the byte array.

Valid values: int, integer, str, string

Default: If this parameter is not used, then an Avro decimal column is read assuming byte arrays store the numerical representation of the values (that is, default to int) as the Avro specification defines.

17.4.3.2 Examples of Creating External Tables with Avro Files

The following examples demonstrate how to query and create external tables using Avro files stored in Oracle Cloud Object Storage.

17.4.3.2.1 Creating an External Table with Avro File

This example creates a database object that references external Avro files.

Example 17-7 Creating an External Table

CREATE TABLE CUSTOMERS_AVRO_XT
(
    CUSTOMER_ID NUMBER,
    FIRST_NAME VARCHAR2(64),
    LAST_NAME VARCHAR2(64),
    EMAIL VARCHAR2(64),
    SIGNUP_DATE DATE
)
ORGANIZATION EXTERNAL
(
    TYPE ORACLE_BIGDATA
    ACCESS PARAMETERS
    (
        com.oracle.bigdata.fileformat=AVRO
        com.oracle.bigdata.credential.name="OCI_CRED"
    )
    LOCATION ('https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_avro/')
);
17.4.3.2.2 Creating an Avro File External Table Using DBMS_CLOUD

This example uses the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to define an external table with an Avro file format.

Example 17-8 Creating an External Table Using DBMS_CLOUD

BEGIN
    DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name => 'CUSTOMERS_AVRO_XT',
        credential_name => 'OCI_CRED',
        file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_avro/*.avro',
        format => '{"type": "avro"}'
    );
END;

Note:

You don't have to specify column list and types, The column list and types are automatically derrived from file itself.

17.4.3.3 Parquet-Specific Access Parameters

Some access parameters are only valid for the Parquet file format.

The first column in this table identifies the access parameters specific to the Parquet file format and the second column describes the parameter.

Note:

Parameters that are specific to a particular file format can not be used for other file formats.

Table 17-4 Parquet-Specific Access Parameters

Parquet-Specific Access Parameter Description

com.oracle.bigdata.prq.binary_as_string

This is a Boolean property that specifies if the binary is stored as a string.

Valid values: true, t, yes, y, l, false, f, no, n, 0

Default: true

com.oracle.bigdata.prq.int96_as_timestamp

This is a Boolean property that specifies if int96 represents a timestamp.

Valid values: true, t, yes, y, l, false, f, no, n, 0

Default: true

17.4.3.4 Examples of Creating External Tables with Avro Files

The following examples demonstrate how to query and create external tables using Avro files stored in Oracle Cloud Object Storage.

17.4.3.4.1 Querying Parquet External Data with Inline External Table

This example queries data directly from an external Parquet file without creating a database object.

Example 17-9 Querying External Data with Inline External Table

SELECT *
FROM EXTERNAL
(
    (
        CUSTOMER_ID NUMBER,
        FIRST_NAME VARCHAR2(64),
        LAST_NAME VARCHAR2(64),
        EMAIL VARCHAR2(64),
        SIGNUP_DATE VARCHAR2(64)
    )
    TYPE ORACLE_BIGDATA
    ACCESS PARAMETERS
    (
        com.oracle.bigdata.fileformat=parquet
        com.oracle.bigdata.credential.name=OCI_CRED
    )
    LOCATION ('https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_parquet/')
) t;
17.4.3.4.2 Creating an External Table Referencing Parquet Files

This example creates a database object that references external Parquet files.

Example 17-10 Creating an External Table

CREATE TABLE CUSTOMERS_PARQ_XT
(
    CUSTOMER_ID NUMBER,
    FIRST_NAME VARCHAR2(64),
    LAST_NAME VARCHAR2(64),
    EMAIL VARCHAR2(64),
    SIGNUP_DATE VARCHAR2(64)
)
ORGANIZATION EXTERNAL
(
    TYPE ORACLE_BIGDATA
    ACCESS PARAMETERS
    (
        com.oracle.bigdata.fileformat=parquet
        com.oracle.bigdata.credential.name=OCI_CRED
    )
    LOCATION ('https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_parquet/')
);
17.4.3.4.3 Creating an External Table Using DBMS_CLOUD Referencing Parquet Files

This example uses the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to define an external table with a Parquet file format.

Example 17-11 Creating an External Table Using DBMS_CLOUD

BEGIN
    DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name => 'CUSTOMERS_PARQ_XT',
        credential_name => 'OCI_CRED',
        file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_parquet/*.parquet',
        format => '{"type": "parquet"}'
    );
END;

Note:

You don't have to specify column list and types, The column list and types are automatically derrived from file itself.

17.4.3.5 Textfile and CSV-Specific Access Parameters

The text file and comma-separated value (csv) file formats are similar to the hive text file format.

Text file and CSV file formats read text and csv data from delimited files. ORACLE_BIGDATA automatically detects the line terminator (either \n, \r, or \r\n). By default, it assumes the fields in the file are separated by commas, and the order of the fields in the file match the order of the columns in the external table.

Note:

Parameters that are specific to a particular file format cannot be used for other file formats.

Table 17-5 Textfile and CSV-Specific Access Parameters

Textfile-Specific Access Parameter Description

com.oracle.bigdata.buffersize

Specifies the size of the input/output (I/O) buffer used for reading the file. The value is the size of the buffer in kilobytes. Note that the buffer size is also the largest size that a record can be. If a format reader encounters a record larger than this value, then it will return an error.

Default: 1024

com.oracle.bigdata.blankasnull

When set to true, loads fields consisting of spaces as null.

Valid values: true, false

Default: false

Example: com.oracle.bigdata.blankasnull=true

com.oracle.bigdata.characterset

Specifies the character set of source files.

Valid values: UTF-8

Default: UTF-8

Example: com.oracle.bigdata.characterset=UTF-8

com.oracle.bigdata.compressiontype

If this parameter is specified, then the code tries to decompress the data according to the compression scheme specified.

Valid values: gzip, bzip2 , zlib, detect

Default: no compression

If detect is specified, then the format reader tries to determine which of the supported compression methods was used to compress the file.

com.oracle.bigdata.conversionerrors

If a row has data type conversion errors, then the related columns are stored as null, or the row is rejected.

Valid values: reject_record, store_null

Default: store_null

Example: com.oracle.bigdata.conversionerrors=reject_record

com.oracle.bigdata.csv.rowformat.nulldefinedas

Specifies the character used to indicate the value of a field is NULL. If the parameter is not specified, then there is no value.

com.oracle.bigdata.csv.rowformat.fields.terminator

Specifies the character used to separate the field values. The character value must be wrapped in single-quotes. Example: '|'.

Default: ','

com.oracle.bigdata.csv.rowformat.fields.escapedby

Specifies the character used to escape any embedded field terminators or line terminators in the value for fields. The character value must be wrapped in single quotes. Example: '\'.

com.oracle.bigdata.dateformat

Specifies the date format in the source file. The format option Auto checks for the following formats:

J, MM-DD-YYYYBC, MM-DD-YYYY, YYYYMMDD HHMISS, YYMMDD HHMISS, YYYY.DDD, YYYY-MM-DD

Default: yyyy-mm-dd hh24:mi:ss

Example: com.oracle.bigdata.dateformat= "MON-RR-DDHH:MI:SS"

com.oracle.bigdata.fields

Specifies the order of fields in the data file. The values are the same as for com.oracle.bigdata.fields in ORACLE_HDFS, with one exception – in this case, the data type is optional. Because the data file is text, the text file reader ignores the data types for the fields, and assumes all fields are text. Because the data type is optional, this parameter can be a list of field names.

com.oracle.bigdata.ignoreblanklines

Blank lines are ignored when set to true.

Valid values: true, false

Default: false

Example: com.oracle.bigdata.ignoreblanklines=true

com.oracle.bigdata.ignoremissingcolumns

Missing columns are stored as null.

Valid values: true

Default: true

Example: com.oracle.bigdata.ignoremissingcolumns=true

com.oracle.bigdata.json.ejson

Specifies whether to enable extended JSON.

Valid values: true, t, yes, y, 1, false, f, no, n, 0

Default: true

Example: com.oracle.bigdata.jason.ejson=yes

com.oracle.bigdata.json.path

A JSON path expression that identifies a sequence of nested JSON values, which will be mapped to table rows.

Valid values: String property

Default: null

Example: '$.data[*]'

com.oracle.bigdata.json.unpackarrays

Specifies whether to unbox the array found in JSON files. The file consists of an array of JSON objects. The entire file is a grammatically valid JSON doc. An example of such a file is [{“a”:1},{“a”:2},{“a”:3}].

Valid values: true, t, yes, y, 1, false, f, no, n, 0

Default: false

Example: com.oracle.bigdata.json.unpackarrays=true

com.oracle.bigdata.quote

Specifies the quote character for the fields. The quote characters are removed during loading when specified.

Valid values: character

Default: Null, meaning no quote

Example: com.oracle.bigdata.csv.rowformat.quotecharacter='"'

com.oracle.bigdata.rejectlimit

The operation errors out after specified number of rows are rejected. This only applies when rejecting records due to conversion errors.

Valid values: number

Default: 0

Example: com.oracle.bigdata.rejectlimit=2

com.oracle.bigdata.removequotes

Removes any quotes that are around any field in the source file.

Valid values: true, false

Default: false

Example:com.oracle.bigdata.removequotes=true

com.oracle.bigdata.csv.skip.header

Specifies how many rows should be skipped from the start of the files.

Valid values: number

Default: 0, if not specified

Example: com.oracle.bigdata.csv.skip.header=1

com.oracle.bigdata.timestampformat

Specifies the timestamp format in the source file. The format option AUTO checks for the following formats:

YYYY-MM-DD HH:MI:SS.FF, YYYY-MM-DD HH:MI:SS.FF3, MM/DD/YYYY HH:MI:SS.FF3

Valid values: auto

Default: yyyy-mm-dd hh24:mi:ss.ff

Example: com.oracle.bigdata.timestamptzformat="auto"

com.oracle.bigdata.timestampltzformat

Specifies the timestamp with local timezone format in the source file. The format option AUTO checks for the following formats:

DD Mon YYYY HH:MI:SS.FF TZR, MM/DD/YYYY HH:MI:SS.FF TZR, YYYY-MM-DD HH:MI:SS+/-TZR, YYYY-MM-DD HH:MI:SS.FF3, DD.MM.YYYY HH:MI:SS TZR

Valid values: auto

Default: yyyy-mm-dd hh24:mi:ss.ff

Example: com.oracle.bigdata.timestampltzformat="auto"

com.oracle.bigdata.timestamptzformat

Specifies the timestamp with timezone format in the source file. The format option AUTO checks for the following formats:

DD Mon YYYY HH:MI:SS.FF TZR, MM/DD/YYYY HH:MI:SS.FF TZR, YYYY-MM-DD HH:MI:SS+/-TZR, YYYY-MM-DD HH:MI:SS.FF3, DD.MM.YYYY HH:MI:SS TZR

Valid values: auto

Default: yyy-mm-dd hh24:mi:ss.ff

Example: com.oracle.bigdata.timestamptzformat="auto"

com.oracle.bigdata.trimspaces

Specifies how the leading and trailing spaces of the fields are trimmed.

Valid values: rtrim, ltrim, notrim, ltrim, ldrtrim

Default: notrim

Example: com.oracle.bigdata.trimspaces=rtrim

com.oracle.bigdata.truncatecol

If the data in the file is too long for a field, then this option truncates the value of the field rather than rejecting the row or setting the field to NULL.

Valid values: true, false

Default: false

Example: com.oracle.bigdata.truncatecol=true

17.4.3.6 Examples of Creating External Tables

The following examples demonstrate different methods for creating tables and querying external CSV data stored in Oracle Cloud Object Storage.

17.4.3.6.1 Creating an External Table Referencing CSV Files

This example creates a database object that references the external CSV file.

Example 17-12 Creating an External Table

CREATE TABLE CUSTOMERS_CSV_XT
(
    CUSTOMER_ID NUMBER,
    FIRST_NAME VARCHAR2(64),
    LAST_NAME VARCHAR2(64),
    EMAIL VARCHAR2(64),
    SIGNUP_DATE DATE
)
ORGANIZATION EXTERNAL
(
    TYPE ORACLE_LOADER
    DEFAULT DIRECTORY DATA_PUMP_DIR
    ACCESS PARAMETERS
    (
        RECORDS DELIMITED BY NEWLINE
        FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
        MISSING FIELD VALUES ARE NULL
        SKIP 1
        (
            CUSTOMER_ID CHAR(10),
            FIRST_NAME CHAR(64),
            LAST_NAME CHAR(64),
            EMAIL CHAR(64),
            SIGNUP_DATE DATE 'YYYY-MM-DD'
        )
    )
    LOCATION ('customers.csv')
)
REJECT LIMIT UNLIMITED;
17.4.3.6.2 Creating an External Table with CSV Files Using DBMS_CLOUD

This example uses the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to define an external table with detailed specifications for columns and fields.

Example 17-13 Creating an External Table Using DBMS_CLOUD

BEGIN
    DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
        table_name => 'CUSTOMERS_CSV_XT',
        credential_name => 'OCI_CRED',
        file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_csv/*.csv',
        column_list => 'CUSTOMER_ID NUMBER,
                      FIRST_NAME VARCHAR2(256),
                      LAST_NAME VARCHAR2(256),
                      EMAIL VARCHAR2(256),
                      SIGNUP_DATE DATE',
        field_list => 'CUSTOMER_ID CHAR,
                      FIRST_NAME CHAR(256),
                      LAST_NAME CHAR(256),
                      EMAIL CHAR(256),
                      SIGNUP_DATE CHAR date_format DATE MASK "YYYY-MM-DD"',
        format => '{
                      "type": "csv",
                      "delimiter": ",",
                      "skipheaders": 1
                  }'
    );
END;
/