17.4 ORACLE_BIGDATA Accessing Files
To use ORACLE_BIGDATA
, you provide information in an
access parameter to indicate how to access and parse the data.
csv
, textfile
,
avro
, parquet
, jsondoc
, or
orc
:com.oracle.bigdata.fileformat=[csv|textfile|avro|parquet|orc|jsondoc]
You can also use ORACLE_BIGDATA
to access local files for
testing or simple querying. In this case, the LOCATION
field value is
the same as what you would use for ORACLE_LOADER
. You can use an Oracle
directory object followed by the name of the file in the LOCATION
field. For local files, a credential object is not required. However, you must have
privileges over on the directory object in order to access the file. For a list of all
files, see:
- Syntax Rules for Specifying Properties
The properties are set using keyword-value pairs in the SQLCREATE TABLE
ACCESS PARAMETERS
clause and in the configuration files. - ORACLE_BIGDATA Common Access Parameters
There is a set of access parameters that are common to all file formats. Some parameters are unique to specific file formats. - ORACLE_BIGDATA Specific Access Parameters
Avro, Parquet, Textfile and CSV all have specific access parameters.
Parent topic: ORACLE_BIGDATA Access Driver
17.4.1 Syntax Rules for Specifying Properties
The properties are set using keyword-value pairs in the SQL CREATE TABLE
ACCESS PARAMETERS
clause and in the configuration files.
The syntax must obey these rules:
-
The format of each keyword-value pair is a keyword, which can be a colon or equal sign, and a value. The following are valid keyword-value pairs:
keyword=value keyword:value
The value is everything from the first non-whitespace character after the separator to the end of the line. Whitespace between the separator and the value is ignored. Trailing whitespace for the value is retained.
-
A property definition can span multiple lines. When this happens, precede the line terminators with a backslash (escape character), except on the last line. For example:
Keyword1= Value part 1 \ Value part 2 \ Value part 3
-
Special characters can be embedded in a property name or property value by preceding the character with a backslash (escape character). The following table describes the special characters:
Table 17-1 Special Characters in Properties
Escape Sequence | Character |
---|---|
|
Backspace ( |
|
Horizontal tab ( |
|
Line feed ( |
|
Form feed ( |
|
Carriage return ( |
|
Double quote ( |
|
Single quote ( |
|
Backslash ( When multiple backslashes are at the end of the line, the parser continues the value to the next line only for an odd number of backslashes. |
|
Unicode code point. |
Note:
When multiple backslashes are at the end of a line, the parser continues the value to the next line only for an odd number of backslashes.Parent topic: ORACLE_BIGDATA Accessing Files
17.4.2 ORACLE_BIGDATA Common Access Parameters
There is a set of access parameters that are common to all file formats. Some parameters are unique to specific file formats.
Common Access Parameters
The following table lists parameters that are common to all file formats
accessed through ORACLE_BIGDATA
. The first column identifies each
access parameter common to all data file types. The second column describes each
parameter.
Note:
Parameters that are specific to a particular file format cannot be used for other file formatsTable 17-2 Common Access Parameters
Common Access Parameter | Description |
---|---|
|
Specifies the format of the file. The value of this parameter identifies the reader that processes the file. Each reader can support additional access parameters that may or may not be supported by other readers. Valid values:
parquet , orc ,
textfile , avro ,
csv , jsondoc
Default:
|
|
Specifies whether log messages should be written to
a log file. When Valid values:
Default:
|
|
Specifies the name of the log file created by the
parallel query coordinator. This parameter is used only when
|
|
Specifies the name of the log file created during
query execution. This value is used (and is required) only when
Valid values:
Default:
|
|
Specifies the credential object to use when accessing data files in an object store. This access parameter is required for object store access. It is not needed for access to files through a directory object or for data stored in public buckets. The name specified for the credential
must be the name of a credential object in the same schema as
the owner of the table. Granting a user DBMS_CREDENTIAL.CREATE_CREDENTIAL in the
DBMS_CREDENTIAL PL/SQL package to create the
credential object. For example:
In the
|
|
Specifies the schema in which the credential object
for accessing Object Stores is created. This parameter is
used in the |
Parent topic: ORACLE_BIGDATA Accessing Files
17.4.3 ORACLE_BIGDATA Specific Access Parameters
Avro, Parquet, Textfile and CSV all have specific access parameters.
- Avro-Specific Access Parameters
In addition to common access parameters, there are parameters that apply only to the Avro file format. - Examples of Creating External Tables with Avro Files
The following examples demonstrate how to query and create external tables using Avro files stored in Oracle Cloud Object Storage. - Parquet-Specific Access Parameters
Some access parameters are only valid for the Parquet file format. - Examples of Creating External Tables with Avro Files
The following examples demonstrate how to query and create external tables using Avro files stored in Oracle Cloud Object Storage. - Textfile and CSV-Specific Access Parameters
The text file and comma-separated value (csv
) file formats are similar to the hive text file format. - Examples of Creating External Tables
The following examples demonstrate different methods for creating tables and querying external CSV data stored in Oracle Cloud Object Storage.
Parent topic: ORACLE_BIGDATA Accessing Files
17.4.3.1 Avro-Specific Access Parameters
In addition to common access parameters, there are parameters that apply only to the Avro file format.
The first column in this table identifies the access parameters specific to the Avro file format and the second column describes the parameter. There is only one Avro-specific parameter at this time.
Note:
Parameters that are specific to a particular file format cannot be used for other file formats.Table 17-3 Avro-Specific Access Parameters
Avro-Specific Parameter | Description |
---|---|
|
Specifies the representation of a decimal stored in the byte array. Valid values:
Default: If this parameter is not used, then an
Avro decimal column is read assuming byte arrays store the numerical
representation of the values (that is, default to
|
Parent topic: ORACLE_BIGDATA Specific Access Parameters
17.4.3.2 Examples of Creating External Tables with Avro Files
The following examples demonstrate how to query and create external tables using Avro files stored in Oracle Cloud Object Storage.
- Creating an External Table with Avro File
This example creates a database object that references external Avro files. - Creating an Avro File External Table Using DBMS_CLOUD
This example uses theDBMS_CLOUD.CREATE_EXTERNAL_TABLE
procedure to define an external table with an Avro file format.
Parent topic: ORACLE_BIGDATA Specific Access Parameters
17.4.3.2.1 Creating an External Table with Avro File
This example creates a database object that references external Avro files.
Example 17-7 Creating an External Table
CREATE TABLE CUSTOMERS_AVRO_XT
(
CUSTOMER_ID NUMBER,
FIRST_NAME VARCHAR2(64),
LAST_NAME VARCHAR2(64),
EMAIL VARCHAR2(64),
SIGNUP_DATE DATE
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_BIGDATA
ACCESS PARAMETERS
(
com.oracle.bigdata.fileformat=AVRO
com.oracle.bigdata.credential.name="OCI_CRED"
)
LOCATION ('https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_avro/')
);
Parent topic: Examples of Creating External Tables with Avro Files
17.4.3.2.2 Creating an Avro File External Table Using DBMS_CLOUD
This example uses the DBMS_CLOUD.CREATE_EXTERNAL_TABLE
procedure to define an external table with an Avro file format.
Example 17-8 Creating an External Table Using DBMS_CLOUD
BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
table_name => 'CUSTOMERS_AVRO_XT',
credential_name => 'OCI_CRED',
file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_avro/*.avro',
format => '{"type": "avro"}'
);
END;
Note:
You don't have to specify column list and types, The column list and types are automatically derrived from file itself.Parent topic: Examples of Creating External Tables with Avro Files
17.4.3.3 Parquet-Specific Access Parameters
Some access parameters are only valid for the Parquet file format.
The first column in this table identifies the access parameters specific to the Parquet file format and the second column describes the parameter.
Note:
Parameters that are specific to a particular file format can not be used for other file formats.Table 17-4 Parquet-Specific Access Parameters
Parquet-Specific Access Parameter | Description |
---|---|
|
This is a Boolean property that specifies if the binary is stored as a string. Valid values:
Default:
|
|
This is a Boolean property that specifies if
Valid values:
Default:
|
Parent topic: ORACLE_BIGDATA Specific Access Parameters
17.4.3.4 Examples of Creating External Tables with Avro Files
The following examples demonstrate how to query and create external tables using Avro files stored in Oracle Cloud Object Storage.
- Querying Parquet External Data with Inline External Table
This example queries data directly from an external Parquet file without creating a database object. - Creating an External Table Referencing Parquet Files
This example creates a database object that references external Parquet files. - Creating an External Table Using DBMS_CLOUD Referencing Parquet Files
This example uses theDBMS_CLOUD.CREATE_EXTERNAL_TABLE
procedure to define an external table with a Parquet file format.
Parent topic: ORACLE_BIGDATA Specific Access Parameters
17.4.3.4.1 Querying Parquet External Data with Inline External Table
This example queries data directly from an external Parquet file without creating a database object.
Example 17-9 Querying External Data with Inline External Table
SELECT *
FROM EXTERNAL
(
(
CUSTOMER_ID NUMBER,
FIRST_NAME VARCHAR2(64),
LAST_NAME VARCHAR2(64),
EMAIL VARCHAR2(64),
SIGNUP_DATE VARCHAR2(64)
)
TYPE ORACLE_BIGDATA
ACCESS PARAMETERS
(
com.oracle.bigdata.fileformat=parquet
com.oracle.bigdata.credential.name=OCI_CRED
)
LOCATION ('https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_parquet/')
) t;
Parent topic: Examples of Creating External Tables with Avro Files
17.4.3.4.2 Creating an External Table Referencing Parquet Files
This example creates a database object that references external Parquet files.
Example 17-10 Creating an External Table
CREATE TABLE CUSTOMERS_PARQ_XT
(
CUSTOMER_ID NUMBER,
FIRST_NAME VARCHAR2(64),
LAST_NAME VARCHAR2(64),
EMAIL VARCHAR2(64),
SIGNUP_DATE VARCHAR2(64)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_BIGDATA
ACCESS PARAMETERS
(
com.oracle.bigdata.fileformat=parquet
com.oracle.bigdata.credential.name=OCI_CRED
)
LOCATION ('https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_parquet/')
);
Parent topic: Examples of Creating External Tables with Avro Files
17.4.3.4.3 Creating an External Table Using DBMS_CLOUD Referencing Parquet Files
This example uses the DBMS_CLOUD.CREATE_EXTERNAL_TABLE
procedure to define an external table with a Parquet file format.
Example 17-11 Creating an External Table Using DBMS_CLOUD
BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
table_name => 'CUSTOMERS_PARQ_XT',
credential_name => 'OCI_CRED',
file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_parquet/*.parquet',
format => '{"type": "parquet"}'
);
END;
Note:
You don't have to specify column list and types, The column list and types are automatically derrived from file itself.Parent topic: Examples of Creating External Tables with Avro Files
17.4.3.5 Textfile and CSV-Specific Access Parameters
The text file and comma-separated value (csv
) file formats
are similar to the hive text file format.
Text file and CSV file formats read text and csv
data from
delimited files. ORACLE_BIGDATA
automatically detects the line
terminator (either \n
, \r
, or \r\n
).
By default, it assumes the fields in the file are separated by commas, and the order of
the fields in the file match the order of the columns in the external table.
Note:
Parameters that are specific to a particular file format cannot be used for other file formats.Table 17-5 Textfile and CSV-Specific Access Parameters
Textfile-Specific Access Parameter | Description |
---|---|
|
Specifies the size of the input/output (I/O) buffer used for reading the file. The value is the size of the buffer in kilobytes. Note that the buffer size is also the largest size that a record can be. If a format reader encounters a record larger than this value, then it will return an error. Default:
|
|
When set to Valid values: Default: Example:
|
|
Specifies the character set of source files. Valid values: Default: Example:
|
|
If this parameter is specified, then the code tries
to decompress the data according to the compression scheme
specified.
Valid values:
Default: no compression If
|
|
If a row has data type conversion errors, then the related columns are stored as null, or the row is rejected. Valid values: Default: Example:
|
|
Specifies the character used to indicate the value
of a field is |
|
Specifies the character used to separate the field
values. The character value must be wrapped in single-quotes.
Example: Default:
|
|
Specifies the character used to escape any embedded
field terminators or line terminators in the value for fields.
The character value must be wrapped in single quotes. Example:
|
|
Specifies the date format in the source file. The
format option
Default: Example:
|
|
Specifies the order of fields in the data file. The
values are the same as for |
|
Blank lines are ignored when set to true. Valid values: Default: Example:
|
|
Missing columns are stored as null. Valid values: Default: Example:
|
com.oracle.bigdata.json.ejson |
Specifies whether to enable extended JSON. Valid values: Default: Example:
|
com.oracle.bigdata.json.path |
A JSON path expression that identifies a sequence of nested JSON values, which will be mapped to table rows. Valid values: String property Default: Example: |
com.oracle.bigdata.json.unpackarrays |
Specifies whether to unbox the array found in JSON
files. The file consists of an array of JSON objects. The entire
file is a grammatically valid JSON doc. An example of such a
file is Valid values: Default: Example:
|
|
Specifies the quote character for the fields. The quote characters are removed during loading when specified. Valid values: character Default: Null, meaning no quote Example:
|
|
The operation errors out after specified number of rows are rejected. This only applies when rejecting records due to conversion errors. Valid values: number Default: Example: |
com.oracle.bigdata.removequotes |
Removes any quotes that are around any field in the source file. Valid values: Default: Example: |
|
Specifies how many rows should be skipped from the start of the files. Valid values: number Default: Example:
|
|
Specifies the timestamp format in the source file.
The format option
Valid values: Default: Example:
|
|
Specifies the timestamp with local timezone format in
the source file. The format option
Valid values: Default: Example:
|
|
Specifies the timestamp with timezone format in the
source file. The format option
Valid values: Default: Example:
|
|
Specifies how the leading and trailing spaces of the fields are trimmed. Valid values: Default: Example:
|
|
If the data in the file is too long for a field, then this option truncates the value of the field rather than rejecting the row or setting the field to NULL. Valid values: Default: Example:
|
Parent topic: ORACLE_BIGDATA Specific Access Parameters
17.4.3.6 Examples of Creating External Tables
The following examples demonstrate different methods for creating tables and querying external CSV data stored in Oracle Cloud Object Storage.
- Creating an External Table Referencing CSV Files
This example creates a database object that references the external CSV file. - Creating an External Table with CSV Files Using DBMS_CLOUD
This example uses theDBMS_CLOUD.CREATE_EXTERNAL_TABLE
procedure to define an external table with detailed specifications for columns and fields.
Parent topic: ORACLE_BIGDATA Specific Access Parameters
17.4.3.6.1 Creating an External Table Referencing CSV Files
This example creates a database object that references the external CSV file.
Example 17-12 Creating an External Table
CREATE TABLE CUSTOMERS_CSV_XT
(
CUSTOMER_ID NUMBER,
FIRST_NAME VARCHAR2(64),
LAST_NAME VARCHAR2(64),
EMAIL VARCHAR2(64),
SIGNUP_DATE DATE
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY DATA_PUMP_DIR
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
MISSING FIELD VALUES ARE NULL
SKIP 1
(
CUSTOMER_ID CHAR(10),
FIRST_NAME CHAR(64),
LAST_NAME CHAR(64),
EMAIL CHAR(64),
SIGNUP_DATE DATE 'YYYY-MM-DD'
)
)
LOCATION ('customers.csv')
)
REJECT LIMIT UNLIMITED;
Parent topic: Examples of Creating External Tables
17.4.3.6.2 Creating an External Table with CSV Files Using DBMS_CLOUD
This example uses the DBMS_CLOUD.CREATE_EXTERNAL_TABLE
procedure to define an external table with detailed specifications for columns and fields.
Example 17-13 Creating an External Table Using DBMS_CLOUD
BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
table_name => 'CUSTOMERS_CSV_XT',
credential_name => 'OCI_CRED',
file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/my_namespace/b/sales_data/o/customers_csv/*.csv',
column_list => 'CUSTOMER_ID NUMBER,
FIRST_NAME VARCHAR2(256),
LAST_NAME VARCHAR2(256),
EMAIL VARCHAR2(256),
SIGNUP_DATE DATE',
field_list => 'CUSTOMER_ID CHAR,
FIRST_NAME CHAR(256),
LAST_NAME CHAR(256),
EMAIL CHAR(256),
SIGNUP_DATE CHAR date_format DATE MASK "YYYY-MM-DD"',
format => '{
"type": "csv",
"delimiter": ",",
"skipheaders": 1
}'
);
END;
/
Parent topic: Examples of Creating External Tables