7 Oracle Big Data SQL Reference
Depending on your Oracle RDBMS release, there may be additional procedures such as
CREATE_HYBRID_PARTNED_TABLE
and
SYNCHRONIZE_PARTITIONS_FOR_HIVE
. For additional information, see
the "Oracle Database PL/SQL Packages and Type Reference" for your RDBMS release at Oracle Database Documentation.
7.1 CREATE TABLE
ACCESS
PARAMETERS Clause
This section describes the properties that you use when creating an external table
that uses the ORACLE_HDFS
, ORACLE_HIVE
, or
ORACLE_BIGDATA
access drivers. In a CREATE TABLE
ORGANIZATION EXTERNAL
statement, specify the parameters in the
opaque_format_spec
clause of ACCESS
PARAMETERS
.
This section contains the following topics:
7.1.1 Syntax Rules for Specifying Properties
The properties are set using keyword-value pairs in the SQL CREATE TABLE
ACCESS PARAMETERS
clause and in the configuration files.
The syntax must obey these rules:
-
The format of each keyword-value pair is a keyword, a colon or equal sign, and a value. The following are valid keyword-value pairs:
keyword=value keyword:value
The value is everything from the first non-whitespace character after the separator to the end of the line. Whitespace between the separator and the value is ignored. Trailing whitespace for the value is retained.
-
A property definition can be on one line or multiple lines.
-
A line terminator is a line feed, a carriage return, or a carriage return followed by line feeds.
-
When a property definition spans multiple lines, then precede the line terminators with a backslash (escape character), except on the last line. In this example, the value of the
Keyword1
property isValue part 1 Value part 2 Value part 3
.Keyword1= Value part 1 \ Value part 2 \ Value part 3
-
You can create a logical line by stripping each physical line of leading whitespace and concatenating the lines. The parser extracts the property names and values from the logical line.
-
You can embed special characters in a property name or property value by preceding a character with a backslash (escape character), indicating the substitution. Table 7-1 describes the special characters.
Table 7-1 Special Characters in Properties
Escape Sequence | Character |
---|---|
|
Backspace ( |
|
Horizontal tab ( |
|
Line feed ( |
|
Form feed ( |
|
Carriage return ( |
|
Double quote ( |
|
Single quote ( |
|
Backslash ( When multiple backslashes are at the end of the line, the parser continues the value to the next line only for an odd number of backslashes. |
|
2-byte, big-endian, Unicode code point. When a character requires two code points (4 bytes), the parser expects |
7.1.2 ORACLE_HDFS
Access
Parameters
The access parameters for the ORACLE_HDFS
access driver provide the metadata needed to locate the data in HDFS and generate a Hive table over it.
7.1.2.1 Default Parameter Settings for
ORACLE_HDFS
Describes default parameter settings for ORACLE_HDFS
.
If you omit all access parameters from the CREATE TABLE
statement, then ORACLE_HDFS
uses the following default values:
com.oracle.bigdata.rowformat=DELIMITED com.oracle.bigdata.fileformat=TEXTFILE com.oracle.bigdata.overflow={"action":"error"} com.oracle.bigdata.erroropt={"action":"setnull"}
7.1.2.2 Optional Parameter Settings for
ORACLE_HDFS
ORACLE_HDFS
supports the following optional com.oracle.bigdata
parameters, which you can specify in the opaque_format_spec
clause:
Example 7-1 shows a CREATE TABLE
statement in which multiple access parameters are set.
Example 7-1 Setting Multiple Access Parameters for ORACLE_HDFS
CREATE TABLE ORDER (CUST_NUM VARCHAR2(10), ORDER_NUM VARCHAR2(20), ORDER_DATE DATE, ITEM_CNT NUMBER, DESCRIPTION VARCHAR2(100), ORDER_TOTAL (NUMBER8,2)) ORGANIZATION EXTERNAL (TYPE ORACLE_HDFS ACCESS PARAMETERS ( com.oracle.bigdata.fields: (CUST_NUM, \ ORDER_NUM, \ ORDER_DATE, \ ORDER_LINE_ITEM_COUNT, \ DESCRIPTION, \ ORDER_TOTAL) com.oracle.bigdata.colMap: {"col":"item_cnt", \ "field":"order_line_item_count"} com.oracle.bigdata.overflow: {"action":"TRUNCATE", \ "col":"DESCRIPTION"} com.oracle.bigdata.errorOpt: [{"action":"replace", \ "value":"INVALID NUM", \ "col":["CUST_NUM","ORDER_NUM"]} , \ {"action":"reject", \ "col":"ORDER_TOTAL}] ) LOCATION ("hdfs:/usr/cust/summary/*"));
7.1.3 ORACLE_HIVE
Access
Parameters
ORACLE_HIVE
retrieves metadata about external data sources from the Hive catalog.
The default mapping of Hive data to columns in the external table are usually appropriate. However, some circumstances require special parameter settings, or you might want to override the default values for reasons of your own.
7.1.3.1 Default Parameter Settings for
ORACLE_HIVE
Describes the default parameter settings for ORACLE_HIVE
.
If you omit all access parameters from the CREATE TABLE
statement, then ORACLE_HIVE
uses the following default values:
com.oracle.bigdata.tablename=name of external table
com.oracle.bigdata.overflow={"action":"error"}
com.oracle.bigdata.erroropt={"action":"setnull"}
7.1.3.2 Optional Parameter Values for
ORACLE_HIVE
ORACLE_HIVE
supports the following optional com.oracle.bigdata
parameters, which you can specify in the opaque_format_spec
clause:
Example 7-2 shows a CREATE TABLE
statement in which multiple access parameters are set.
Example 7-2 Setting Multiple Access Parameters for ORACLE_HIVE
CREATE TABLE ORDER (cust_num VARCHAR2(10), order_num VARCHAR2(20), order_date DATE, item_cnt NUMBER, description VARCHAR2(100), order_total (NUMBER8,2)) ORGANIZATION EXTERNAL (TYPE oracle_hive ACCESS PARAMETERS ( com.oracle.bigdata.tableName: order_db.order_summary com.oracle.bigdata.colMap: {"col":"ITEM_CNT", \ "field":"order_line_item_count"} com.oracle.bigdata.overflow: {"action":"ERROR", \ "col":"DESCRIPTION"} com.oracle.bigdata.errorOpt: [{"action":"replace", \ "value":"INV_NUM" , \ "col":["CUST_NUM","ORDER_NUM"]} ,\ {"action":"reject", \ "col":"ORDER_TOTAL}] ));
7.1.4 Full List of Access Parameters for
ORACLE_HDFS
and ORACLE_HIVE
7.1.4.1 com.oracle.bigdata.buffersize
Sets the buffer size in kilobytes for large record reads. Set this value if you need to read records that are greater than the default buffer size.
Default Value
1000 KB
Syntax
com.oracle.bigdata.buffersize: n
Example
The following example sets the buffer size to 100 MB:
com.oracle.bigdata.buffersize: 100000
7.1.4.2 com.oracle.bigdata.datamode
Specifies the method that SmartScan uses to scan a Hadoop data source. The method can make a significant difference in performance.
Default Value
automatic
Syntax
A JSON document with the keyword-value pairs shown in the following diagram:
datamode:
Semantics
automatic
Automatically selects the appropriate mode, based on the metadata. It selects c
mode if possible, or java
mode if the data contains formats that are not supported by c
mode.
c
Uses Java to read the file buffers, but C code to process the data and convert it to Oracle format. Specify this mode for delimited data.
If the data contains formats that the C code does not support, then it returns an error.
java
Uses the Java SerDes and InputFormats to process the data and convert it to Oracle format. Specify this mode for Parquet, RCFile, and other data formats that require a SerDe.
7.1.4.3 com.oracle.bigdata.colmap
Maps a column in the source data to a column in the Oracle external table. You can define one or multiple pairs of column mappings. Use this property when the source field names exceed the maximum length of Oracle column names, or when you want to use different column names in the external table.
Default Value
A column in the external table with the same name as the Hive column
Syntax
A JSON document with the keyword-value pairs is shown in the following diagram:
colmap:
colmap_entry:
Semantics
"col":name
"col"
: The keyword must be lowercase and enclosed in quotation marks.
name
: The name of a column in the Oracle external table. It is case sensitive and must be enclosed in quotation marks.
"field":name
"field"
: The keyword must be lowercase and enclosed in quotation marks.
name
: The name of a field in the data source. It is not case sensitive, but it must be enclosed in quotation marks. See Syntax Rules for Specifying Properties.
Examples
This example maps a Hive column named ORDER_LINE_ITEM_COUNT
to an Oracle column named ITEM_CNT
:
com.oracle.bigdata.colMap={"col":"ITEM_CNT", \ "field":"order_line_item_count"}
The following example shows the mapping of multiple columns.
com.oracle.bigdata.colmap:[{"col":"KOL1", "field":"PROJECT_NAME"},{ "col":"KOL2","field":"wsdl_name"},{"col":"KOL3", "field":"method"}]
7.1.4.4 com.oracle.bigdata.erroropt
Describes how to handle errors that occur while the value of a column is calculated.
Default Value
{"action":"setnull"}
Syntax
A JSON document with the keyword-value pairs is shown in the following diagram:
erroropt:
error_element:
Semantics
The "action"
, "reject"
, "setnull"
, "replace"
, "value"
, and "col"
keywords must be lowercase and enclosed in quotation marks. See Syntax Rules for Specifying Properties.
"action":value
value
: One of these keywords:
-
"reject"
: Does not load any rows. -
"setnull"
: Sets the column toNULL
. -
"replace"
: Sets the column to the specified value.
"value":string
string
: Replaces a bad value in the external table. It must be enclosed in quotation marks.
"col":name
name
: Identifies a column in an external table. The column name is case sensitive, must be enclosed in quotation marks, and can be listed only once.
Example
This example sets the value of the CUST_NUM
or ORDER_NUM
columns to INVALID
if the Hive value causes an error. For any other columns, an error just causes the Hive value to be rejected.
com.oracle.bigdata.errorOpt: {"action":"replace",\ "value":"INVALID", \ "col":["CUST_NUM","ORDER_NUM"]
7.1.4.5 com.oracle.bigdata.fields
Lists the field names and data types of the data source.
Default Value
Not defined
Syntax
A JSON document with the keyword-value pairs is shown in the following diagram:
fields:
data_type:
primitive_type:
Semantics
The syntax is the same as a field list for a Hive table. If you split the field list across multiple lines, you must use a backslash to escape the new line characters.
field_name
The name of the Hive field. Use only alphanumeric characters and underscores (_). The maximum length is 128 characters. Field names are case-insensitive.
data_type
The data type of the Hive field. Optional; the default is STRING
. The character set must be UTF8.
The data type can be complex or primitive:
Hive Complex Data Types
-
ARRAY
: Indexable list -
MAP
: Key-value tuples -
STRUCT
: List of elements -
UNIONTYPE
: Multiple data types
Hive Primitive Data Types
-
INT
: 4 byte integer -
BIGINT
: 8 byte integer -
SMALLINT
: 2 byte integer -
TINYINT
: 1 byte integer -
BOOLEAN
:TRUE
orFALSE
-
FLOAT
: single precision -
DOUBLE
: double precision -
STRING
: character sequence
See Also:
"Data Types" in the Apache Hive Language Manual at
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
COMMENT col_comment
A string literal enclosed in single quotation marks, which is stored as metadata for the Hive table (comment
property of TBLPROPERTIES
).
7.1.4.6 com.oracle.bigdata.fileformat
Describes the row format of the data source, based on the ROW FORMAT
clause for a Hive table generated by ORACLE_HDFS
.
Default Value
TEXTFILE
Syntax
A JSON document with the keyword-value pairs is shown in the following diagram.
fileformat:
Semantics
ORC
Optimized row columnar file format
PARQUET
Column-oriented, binary file format
RCFILE
Record columnar file format
SEQUENCEFILE
Compressed file format
TEXTFILE
Plain text file format
INPUTFORMAT
Identifies a Java class that can extract records from the data file.
OUTPUTFORMAT
Identifies a Java class that can format the output records in the desired format
7.1.4.7 com.oracle.bigdata.log.exec
Specifies how the access driver generates log files generated by the C code for a query, when it is running as parallel processes on CDH.
The access driver does not create or write log files when executing on a Hadoop
cluster node; the parallel query processes write
them. The log files from the Java code are
controlled by log4j
properties,
which are specified in the configuration file or the
access parameters.
See Also:
bigdata-log4j.propertiesDefault Value
Not defined (no logging)
Syntax
[directory_object:]file_name_template
Semantics
directory_object
The Oracle directory object for the HDFS path on the Hadoop cluster where the log file is created.
file_name_template
A string used to generate file names. This table describes the optional variables that you can use in the template.
Table 7-2 Variables for com.oracle.bigdata.log.exec
Variable | Value |
---|---|
|
Operating system process identifier (PID) |
|
A number that uniquely identifies the process. |
|
A percent sign ( |
Example
The following example generates log
file names that include the PID and a unique
number, such as
xtlogp_hive14_3413_57
:
com.oracle.bigdata.log.exec=
xtlogp_hive14_%p_%a
7.1.4.8 com.oracle.bigdata.log.qc
Specifies how the access driver generates log files for a query.
Default Value
Not defined (no logging)
Syntax
[directory_object:]file_name_template
Semantics
directory_object
Name of an Oracle directory object that points to the path where the log files are written. If this value is omitted, then the logs are written to the default directory for the external table.
file_name_template
A string used to generate file names. Table 7-3 describes the optional variables that you can use in the string.
Table 7-3 Variables for com.oracle.bigdata.log.qc
Variable | Value |
---|---|
|
Operating system process identifier (PID) |
|
A percent sign ( |
Example
This example creates log file names that include the PID and a percent sign, such as xtlogp_hive213459_%
:
com.oracle.bigdata.log.qc= xtlogp_hive21%p_%%
7.1.4.9 com.oracle.bigdata.overflow
Describes how to handle string data that is too long for the columns in the external table. The data source can be character or binary. For Hive, the data source can also be STRUCT
, UNIONTYPES
, MAP
, or ARRAY
.
Default Value
{"action":"error"}
Syntax
A JSON document with the keyword-value pairs is shown in the following diagram:
overflow ::=
overflow_element ::=
Semantics
The "action"
, "truncate"
, "error"
, and "col"
tags must be lowercase and enclosed in quotation marks. See Syntax Rules for Specifying Properties.
"action":value
The value of "action"
can be one of the following keywords:
-
truncate
: Shortens the data to fit the column. -
error
: Throws an error. Thecom.oracle.bigdata.erroropt
property controls the result of the error.
"col":name
name
: Identifies a column in the external table. The name is case sensitive and must be enclosed in quotation marks.
Example
This example truncates the source data for the DESCRIPTION
column, if it exceeds the column width:
com.oracle.bigdata.overflow={"action":"truncate", \ "col":"DESCRIPTION"}
7.1.4.10 com.oracle.bigdata.rowformat
Provides the information the access driver needs to extract fields from the records in a file.
Important:
Thecom.oracle.bigdata.rowformat
is unrelated to the access parameter syntax of traditional external tables that use "type ORACLE_LOADER." There are keywords such as FIELDS
, TERMINATED
, and others that appear in both clauses, but the commonality in naming is coincidental and does not imply common functionality. The com.oracle.bigdata.rowformat
access parameter is passed without change to the default Hive serde. The Hive serde to extract columns from rows is deliberately limited. Complex cases are handled by specialized serdes.
Default Value
DELIMITED
Syntax
A JSON document with the keyword-value pairs is shown in the following diagram.
rowformat:
Semantics
DELIMITED
Describes the characters used to delimit the fields in a record:
-
FIELDS TERMINATED BY
: The character that delimits every field in the record. The optional ESCAPED BY character precedes the delimit character when it appears within a field value. -
COLLECTION ITEMS TERMINATED BY
: The character that marks the end of an array element. Used when a column is a collection or a nested record. In this case the resulting value will be a JSON array. -
MAP KEYS TERMINATED BY
: The character that marks the end of an entry in a MAP field. Used when a column is a collection or a nested record. The resulting value is a JSON object. -
LINES TERMINATED BY
: The character that marks the end of a record. -
NULL DEFINED AS
: The character that indicates a null value.
SERDE
Identifies a SerDe that can parse the data and any properties of the SerDe that the access driver might need.
Example
This example specifies a SerDe for an Avro container file:
com.oracle.bigdata.rowformat: SERDE'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
The next example specifies a SerDe for a file containing regular expressions:
com.oracle.bigdata.rowformat=\ SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' \ WITH SERDEPROPERTIES \ ("input.regex" = "(\\\\d{6}) (\\\\d{5}) (.{29}) .*")
7.1.4.11 com.oracle.bigdata.tablename
The Hive parameter com.oracle.bigdata.tablename
identifies the Hive table that contains the source data.
Default Value
DEFAULT.external_table_name
Syntax
[hive_database_name.]table_name
Semantics
The maximum length of hive_database_name and table_name is 128 UTF-8 characters (512 bytes).
hive_database_name
: The Hive database where the source data resides. DEFAULT
is the name of the initial Hive database.
table_name
: The Hive table with the data. If you omit table_name
, then ORACLE_HIVE
searches for a Hive table with the same name as the external table. Table names are case-insensitive.
Example
This setting indicates that the source data is in a table named ORDER_SUMMARY
in the Hive ORDER_DB
database:
com.oracle.bigdata.tablename ORDER_DB.ORDER_SUMMARY
7.1.5 ORACLE_BIGDATA Access Parameters
There is a set of access parameters that are common to all file formats. There are also parameters that are unique to a specific file format.
Common Access Parameters
The following table lists parameters that are common to all file formats
accessed through ORACLE_BIGDATA
. The first column identifies each
access parameter common to all data file types. The second column describes each
parameter.
Table 7-4 Common Access Parameters
Common Access Parameter | Description |
---|---|
|
Specifies the credential object to use when accessing data files in an object store. This access parameter is required for object store access. It is not needed for access to files through a directory object or for data stored in public buckets. The name specified for the credential must be the
name of a credential object in the same schema as the owner of
the table. Granting a user Use
DBMS_CREDENTIAL.CREATE_CREDENTIAL in the
DBMS_CREDENTIAL PL/SQL package to create
the credential
object:
In the
|
|
Specifies the format of the file. The value of this parameter identifies the reader that processes the file. Each reader can support additional access parameters that may or may not be supported by other readers. Valid values:
parquet , orc ,
textfile , avro ,
csv , jsondoc ,
jsontable
Default:
|
|
Specifies whether log messages should be written to
a log file. When Valid values:
Default:
|
|
Specifies the name of the log file created by the
parallel query coordinator. This parameter is used only when
|
|
Specifies the name of the log file created during
query execution. This value is used (and is required) only when
Valid values:
Default:
|
Avro-Specific Access Parameters
In addition to common access parameters, there are some that are only valid for the Avro file format. The first column in this table identifies the access parameters specific to the Avro file format and the second column describes the parameter. There is only one Avro-specific parameter at this time.
Table 7-5 Avro-Specific Access Parameters
Avro-Specific Parameter | Description |
---|---|
|
Specifies the representation of a decimal stored in the byte array. Valid values:
Default: If this parameter is not used, an
Avro decimal column is read assuming byte arrays store the
numerical representation of the values (that is default to
|
Parquet-Specific Access Parameters
Some access parameters are only valid for the Parquet file format. The first column in this table identifies the access parameters specific to the Parquet file format and the second column describes the parameter.
Table 7-6 Parquet-Specific Access Parameters
Parquet-Specific Access Parameter | Description |
---|---|
|
This is a Boolean property that specifies if the binary is stored as a string. Valid values:
Default:
|
|
This is a Boolean property that specifies if
Valid values:
Default:
|
Textfile and CSV-Specific Access Parameters
The text file and comma-separated value (csv
) file
formats are similar to the hive text file format. It reads text and
csv
data from delimited files. ORACLE_BIGDATA
automatically detects the line terminator (either \n
,
\r
, or \r\n
). By default, it assumes the
fields in the file are separated by commas, and the order of the fields in the file
match the order of the columns in the external table.
Example 7-3 CSV Data File
This is a simple csv
example. The data file has
comma-separated values, with optional enclosing quotes.
–----Source csv data in t.dat
t.dat:
1,"abc",
2,xyx,
–---------Create an external table over the csv source data in t.dat
CREATE TABLE t
(
c0 number,
c1 varchar2(20)
)
ORGANIZATION external
(
TYPE oracle_bigdata
DEFAULT DIRECTORY DMPDIR
ACCESS PARAMETERS
(
com.oracle.bigdata.fileformat=csv
)
location
(
't.dat'
)
)REJECT LIMIT 1
;
–------Select data from external table
select c0, c1 from t;
C0 C1
---- -----
1 abc
2 xyz
Example 7-4 CSV Data File
This example shows how to create an external table over a
csv
data source, which has '|
' as the field
separator, the data file compressed with gzip
, blanks as null, and
a date format.
–----The source csv data in t.dat
t.dat:
1| |
2|Apr-99-30|
–------Create an external table over the csv data source in t.dat
CREATE TABLE t(
c0 number,
c1 date
)
ORGANIZATION external
(
TYPE oracle_bigdata
DEFAULT DIRECTORY DMPDIR
ACCESS PARAMETERS
(
com.oracle.bigdata.fileformat=textfile
com.oracle.bigdata.compressiontype=gzip
com.oracle.bigdata.csv.rowformat.separatorcharacter='|'
com.oracle.bigdata.blankasnull=true
com.oracle.bigdata.dateformat="MON-RR-DD HH:MI:SS"
)
location
(
't.dat.gz'
)
)REJECT LIMIT 1
;
--Select csv data from external table
QL> select c0, c1 from t;
C0 C1
------ ---------
1
2 30-APR-99
Example 7-5 JSON Data File - Map JSON values to columns
This approach works well for simple JSON documents. The expectation is
that the rows will primarily consist of a single object with multiple key/value
pairs. Like ordinary tables, the column data type has to match the data type of the
value found in the file. In cases where the document contains values with nesting,
that is complex values, the parser will output the value as a JSON document.
However, this is stored as a string. User specifies this mapping strategy via access
parameter com.oracle.bigdata.fileformat=jsontable
.
As an example of this mapping scheme, consider the following UC1 file:
{"ts":1603802918441,"level":"DEBUG1","component":"resp","cell-id":"cloudsql-v1-12"}
{"ts":1603802918442,"level":"DEBUG2","component":[{"tst":1},{"tst":2},{"tst":3}],"cellid":"cloudsql-v1-12"}
{"ts":1603802918443,"level":"DEBUG3","component":"resp","cell-id":"cloudsql-v1-12"}
{"ts":1603802918444,"level":"DEBUG4","component":"resp","cell-id":"cloudsql-v1-12"}
We could create the following external table:
CREATE TABLE logs(
ts number(20),
lvl varchar(35)
component varchar(50),
cellid varchar2(20)
) ORGANIZATION EXTERNAL
(TYPE ORACLE_BIGDATA
DEFAULT DIRECTORY default_dir
ACCESS PARAMETERS
(com.oracle.bigdata.fileformat = jsontable)
location ('nested.json'));
Then, we can query the table as follows:
SQL> Select * from logs;
TS LVL COMPONENT CELLID
---------- --------- -------------------------------- -----------------
1.6038E+12 DEBUG1 resp cloudsql-v1-12
1.6038E+12 DEBUG2 [{"tst":1},{"tst":2},{"tst":3}] cloudsql-v1-12
1.6038E+12 DEBUG3 resp cloudsql-v1-12
1.6038E+12 DEBUG4 resp cloudsql-v1-12
4 rows selected.
Note, that the value of component column in row 2 is represented as a JSON doc. As already mentioned, this is stored as a varchar in v1. In general, the datatype mapping strategy relies on the datatypes defined at table DDL. When there is a datatype mismatch, the value is rejected and processing continues. For example, consider the following scenario:
{"Customer ID", 123456}
{"Customer ID", "123456"}
{"Customer ID", true}
If the table column was defined as a number, row 3 from above will produce a conversion error and reject the value. However, it will be resilient to the enquoted value of row 2.
Example 7-6 JSON Data File - Entire row is mapped to a single column.
In this mapping strategy, the entire JSON document is mapped to a single
column of type JSON. User specifies this mapping strategy via access parameter
com.oracle.bigdata.fileformat=jsondoc
.
As an example of this mapping scheme, consider the following UC1 file:
{"ts":1603802918441,"level":"DEBUG1","component":"resp","cell-id":"cloudsql-v1-12"}
{"ts":1603802918442,"level":"DEBUG2","component":[{"tst":1},{"tst":2},{"tst":3}],"cellid":"cloudsql-v1-12"}
{"ts":1603802918443,"level":"DEBUG3","component":"resp","cell-id":"cloudsql-v1-12"}
{"ts":1603802918444,"level":"DEBUG4","component":"resp","cell-id":"cloudsql-v1-12"}
We could create the following external table:
CREATE TABLE logs (data JSON)
ORGANIZATION EXTERNAL
(TYPE ORACLE_BIGDATA
DEFAULT DIRECTORY default_dir
ACCESS PARAMETERS
(com.oracle.bigdata.fileformat = jsondoc)
location ('nested.json'));
Then, we can query the external table:
SQL> select l.doc.cell-id, l.doc.component[1].tst
from logs l
where l.doc.ts = 1603802918442;
cell-id tst
--------------------------------
cloudsql-v1-12 2
In the example above, we selected the component and cell-id value of the second row (TS = 1603802918442). Notice that the component value of row 2 is an array. We then, proceeded by querying the second item of the array, i.e. COMPONENT[1]).
Example 7-7 JSON Data File
{"id":"72","name":"George","lat":40.76727216,"lon":-73.99392888,"segments":["wealthy","middle-aged"],"age":45}
{"id":"79","name":"Lucy","lat":40.71911552,"lon":-74.00666661,"segments":["married","yes"],"age":33}
— Create the external table over Json source
CREATE TABLE people (
data json
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_BIGDATA
ACCESS PARAMETERS(
com.oracle.bigdata.fileformat=jsondoc
)
LOCATION ('https://swftobjectstorage.us-ashburn-1.oraclecloud.com/n/mytenant/b/people/o/data.json')
)
REJECT LIMIT UNLIMITED;
–---Select data from external table
select s.data.id,
s.data.name,
s.data.segments[0]
from people s;
id Name segments[0]
–--- –---- –-----------------------------------
72 George wealthy
79 Lucy married
Textfile, CSV-Specific and JSON-Specific Access Parameters
Table 7-7 Textfile and CSV-Specific Access Parameters
Textfile-Specific Access Parameter | Description |
---|---|
|
Specifies the size of the input/output (I/O) buffer used for reading the file. The value is the size of the buffer in kilobytes. Note that the buffer size is also the largest size that a record can be. If a format reader encounters a record larger than this value, then it will return an error. Default:
|
|
When set to Valid values: Default: Example:
|
|
Specifies the character set of source files. Valid values: Default: Example:
|
|
If this parameter is specified, then the code tries
to decompress the data according to the compression scheme
specified.
Valid values:
Default: no compression If
|
|
If a row has data type conversion errors,
then the related columns are stored as null, or the row is
rejected. Valid values:
Default:
Example:
|
|
Specifies the character used to indicate the value
of a field is |
|
Specifies the character used to separate the field
values. The character value must be wrapped in single-quotes.
Example: Default:
|
|
Specifies the character used to escape any embedded
field terminators or line terminators in the value for fields.
The character value must be wrapped in single quotes. Example:
|
|
Specifies the date format in the source file. The
format option
Default: Example:
|
|
Specifies the order of fields in the data file. The
values are the same as for |
|
Blank lines are ignored when set to true. Valid values: Default: Example:
|
|
Missing columns are stored as null. Valid values: Default: Example:
|
com.oracle.bigdata.json.ejson |
Specifies whether to enable extended JSON. Valid values: Default: Example:
|
com.oracle.bigdata.json.path |
A JSON path expression which identifies a sequence of nested JSON values which will be mapped to table rows. Valid values: String property Default: Example: |
com.oracle.bigdata.json.unpackarrays |
Specifies whether to unbox the array found in JSON files. The file consists of an array of JSON objects. The entire file is a grammatically valid JSON doc. An example of such a file is [{“a”:1},{“a”:2},{“a”:3}]. Valid values: Default: Example:
|
|
Specifies the quote character for the fields. The quote characters are removed during loading when specified. Valid values: character Default: Null, meaning no quote Example:
|
|
The operation errors out after specified number of rows are rejected. This only applies when rejecting records due to conversion errors. Valid values: number Default: Example: |
com.oracle.bigdata.removequotes |
Removes any quotes that are around any field in the source file. Valid values: Default: Example: |
|
Specifies how many rows should be skipped from the start of the files. Valid values: number Default: Example:
|
|
Specifies the timestamp format in the source file.
The format option
Valid values: Default: Example:
|
|
Specifies the timestamp with local timezone format in
the source file. The format option
Valid values: Default: Example:
|
|
Specifies the timestamp with timezone format in the
source file. The format option
Valid values: Default: Example:
|
|
Specifies how the leading and trailing spaces of the fields are trimmed. Valid values: Default: Example:
|
|
If the data in the file is too long for a field, then this option truncates the value of the field rather than rejecting the row or setting the field to NULL. Valid values: Default: Example:
|
7.2 Static Data Dictionary Views for Hive
The Oracle Database catalog contains several static data dictionary views for Hive tables. You can query these data dictionary views to discover information about the Hive tables that you can access.
For you to access any Hive databases from Oracle Database, you must have read privileges on the ORACLE_BIGDATA_CONFIG
directory object.
7.2.1 ALL_HIVE_DATABASES
ALL_HIVE_DATABASES
describes all databases in the Hive metastore accessible to the current user.
Related Views
-
DBA_HIVE_DATABASES
describes all the databases in the Hive metastore. -
USER_HIVE_DATABASES
describes the databases in the Hive metastore owned by the current user.
Column | Datatype | NULL | Description |
---|---|---|---|
|
|
|
Hadoop cluster where the Hive metastore is located |
|
|
|
Hive database name |
|
|
Hive database description |
|
|
|
|
|
|
|
Hive database URI |
See Also:
7.2.2 ALL_HIVE_TABLES
ALL_HIVE_TABLES
describes all tables in the Hive metastore accessible to the current user.
The Oracle Big Data SQL configuration must identify the default Hive database for the current user. The current user must also have READ
privileges on the ORA_BIGSQL_CONFIG
database directory. See "About the bigdata_config Directory".
Related Views
-
DBA_HIVE_TABLES
describes all tables in the Hive metastore. -
USER_HIVE_TABLES
describes the tables in the database owned by the current user in the Hive metastore.
Column | Datatype | NULL | Description |
---|---|---|---|
|
|
|
Hadoop cluster where the Hive metastore is located |
|
|
|
Name of the Hive database |
|
|
|
Name of the Hive table |
|
|
||
|
|
Number of columns in the Hive table |
|
|
|
Time when the table was created |
|
|
|
Time of most recent access |
|
|
|
Owner of the Hive table |
|
|
|
|
Type of Hive table, such as external or managed |
|
|
Whether the table is partitioned ( |
|
|
|
Number of partitions |
|
|
|
Input format |
|
|
|
Output format |
|
|
|
SerDe serialization information |
|
|
|
Whether the table is compressed ( |
|
|
|
Hive database URI |
See Also:
7.2.3 ALL_HIVE_COLUMNS
ALL_HIVE_COLUMNS
describes the columns of all Hive tables accessible to the current user.
The Oracle Big Data SQL configuration must identify the default Hive database for the current user. The current user must also have READ
privileges on the ORA_BIGSQL_CONFIG
database directory. See "About the bigdata_config Directory".
Related Views
-
DBA_HIVE_COLUMNS
describes the columns of all tables in the Hive metastore. -
USER_HIVE_COLUMNS
describes the columns of the tables in the Hive database owned by the current user.
Column | Datatype | NULL | Description |
---|---|---|---|
|
|
|
Hadoop cluster where the Hive metastore is located |
|
|
|
Name of the Hive database; if blank, then the default database |
|
|
|
Name of the Hive table |
|
|
|
Name of the Hive column |
|
|
|
Data type of the Hive column |
|
|
|
Oracle data type equivalent to Hive data type |
|
|
||
|
|
Owner of the Hive table |
|
|
|
Time when the table was created |
|
|
|
Hive database URI |
See Also:
7.2.4 DBA_HIVE_DATABASES
DBA_HIVE_DATABASES
describes all the databases in the Hive metastore. Its columns are the same as those in ALL_HIVE_DATABASES
.
See Also:
7.2.5 DBA_HIVE_TABLES
DBA_HIVE_TABLES
describes all tables in the Hive metastore. Its columns are the same as those in ALL_HIVE_TABLES
.
The Oracle Big Data SQL configuration must identify the default Hive database for the current user. See "About the bigdata_config Directory".
See Also:
7.2.6 DBA_HIVE_COLUMNS
DBA_HIVE_COLUMNS
describes the columns of all tables in the Hive metastore. Its columns are the same as those in ALL_HIVE_COLUMNS
.
See Also:
7.2.7 USER_HIVE_DATABASES
USER_HIVE_DATABASES
describes the databases in the Hive metastore owned by the current user. Its columns (except for OWNER
) are the same as those in ALL_HIVE_DATABASES
.
See Also:
7.2.8 USER_HIVE_TABLES
USER_HIVE_TABLES
describes the tables in the database owned by the current user in the Hive metastore. Its columns (except for OWNER
) are the same as those in ALL_HIVE_TABLES
.
The Oracle Big Data SQL configuration must identify the default Hive database for the current user. The current user must also have READ
privileges on the ORA_BIGSQL_CONFIG
database directory. See "About the bigdata_config Directory".
See Also:
7.2.9 USER_HIVE_COLUMNS
USER_HIVE_COLUMNS
describes the columns of the tables in the Hive database owned by the current user. Its columns (except for OWNER
) are the same as those in ALL_HIVE_COLUMNS
.
The Oracle Big Data SQL configuration must identify the default Hive database for the current user. The current user must also have READ
privileges on the ORA_BIGSQL_CONFIG
database directory. See "About the bigdata_config Directory".
See Also:
7.3 DBMS_BDSQL PL/SQL Package
The DBMS_BDSQL PL/SQL package contains procedures to add and remove a user map.
This appendix contains the following sections:
In previous releases of Oracle Big Data SQL, all queries against Hadoop and Hive data are executed as the oracle
user and there is no option to change users. Although oracle
is still the underlying user in all cases, you can now use Multi-User Authorization (based on Hadoop Secure Impersonation) to direct the oracle account to execute tasks on behalf of other designated users. This enables HDFS data access based on the user that is currently executing the query, rather than the singular oracle
user.
The DBMS_BDSQL package enables you to provide rules for identifying the currently connected user and to map the connected user to the user that is impersonated. Because there are numerous ways in which users can connect to Oracle Database, this user may be a database user, a user sourced from LDAP, from Kerberos, and so forth. Authorization rules on the files apply for that user and audits will reflect that user as well.
Note:
Grant the new BDSQL_ADMIN role to designated administrators in order to allow them to invoke these procedures.
7.3.1 ADD_USER_MAP
Use the ADD_USER_MAP procedure to specify the rules for identifying the actual user who is running the query.
At query execution time, the database performs a lookup on the BDSQL_USER_MAP table to determine the current database user (current_database_user
). It then uses the syscontext_namespace
and syscontext_parm_hadoop_user
parameters to identify the actual user.
Syntax
procedure ADD_USER_MAP ( cluster_name IN VARCHAR2 DEFAULT '[DEFAULT]', current_database_user IN VARCHAR2 NOT NULL, syscontext_namespace IN VARCHAR2 DEFAULT NULL, syscontext_parm_hadoop_user IN VARCHAR2 NOT NULL );
Table 7-8 ADD_USER_MAP Parameters
Parameter | Description |
---|---|
cluster_name |
The name of the Hadoop cluster where the map will be applied. [DEFAULT] as cluster name designates the default cluster.
|
current_database_user |
The current effective database user. This is what Oracle uses to check for authority. A value of '*' indicates that this row to be used if no other rows fit the criteria. There is no default and the value may not be NULL. |
syscontext_namespace |
Note that for the Oracle USERENV namespace, the only allowed values are GLOBAL_UID , CLIENT_IDENTIFIER , and AUTHENTICATED_IDENTITY .
If your Oracle Database installation uses Kerberos credentials, SSL, Active Directory, or LDAP for authentication, it is likely that your Hadoop system uses the same authentication framework. In that case, |
syscontext_parm_hadoop_user |
The Hadoop user that will impersonate the current database user. |
Note:
The values for current_database_user
and syscontext_parm_hadoop_user
can be the single asterisk character (*) or any string that meets the requirements of Oracle simple_sql_name
assertion:
-
The name must begin with an alphabetic character. It may contain alphanumeric characters as well as the characters _, $, and # in the second and subsequent character positions.
-
Quoted SQL names are also allowed.
-
Quoted names must be enclosed in double quotes.
-
Quoted names allow any characters between the quotes.
-
Quotes inside the name are represented by two quote characters in a row, for example, "a name with "" inside" is a valid quoted name.
-
The input parameter may have any number of leading and/or trailing white space characters.
7.3.2 REMOVE_USER_MAP
Use REMOVE_USER_MAP to remove a row from BDSQL_USER_MAP table. This disassociates a specific Oracle Database user from specific Hadoop user.
Syntax
procedure REMOVE_USER_MAP ( cluster_name IN VARCHAR2 DEFAULT '[DEFAULT]', current_database_user IN VARCHAR2 NOT NULL );
See Also:
The reference page for ADD_USER_MAP describes the cluster_name
and current_database_user
parameters.
7.3.3 Multi-User Authorization Security Table
SYS.BDSQL_USER_MAP
is the multi-user authorization security table.
Use the procedures ADD_USER_MAP
and REMOVE_USER_MAP
to update this table.
The primary key is (cluster_name, current_database_user
).
Table 7-9 SYS.BDSQL_USER_MAP
Column | Datatype | Description |
---|---|---|
cluster_name |
varchar2 | Name of the Hadoop cluster. The default is [DEFAULT] .
|
current_database_user |
varchar2 | The current effective database user (no default, not NULL). Oracle uses this column to check for the authorization rule that corresponds to the given Oracle Database user. A value of '* ' in a row is a directive to use this row if no other rows fit the criteria.
|
syscontext_namespace |
varchar2 | This is the optional specification for the Oracle SYS_CONTEXT namespace. if customer security is set up. Note that for the Oracle USERENV namespace, the only allowed values are: ‘GLOBAL_UID’ ,‘CLIENT_IDENTIFIER’ , ‘AUTHENTICATED_IDENTITY’ .
|
syscontext_parm_hadoop_user |
varchar2 | This column value has alternate interpretations.
|
Here a customer is using Active Directory, Kerberos, SSL, or LDAP for logon authentication against the database. AUTHENTICATED_IDENTITY
is specified in this case because customer uses the same Active Directory framework for Hadoop user management.
The example below is similar to running the following SQL query for the currently connected user:
select sys_context('USERENV', 'AUTHENTICATED_IDENTITY') from dual;
In this example, only the username (without the “@<domain
>” segment) is used for authorization on the Hadoop cluster. There may also be cases where the format of AUTHENTICATED_IDENTITY
is <username>/<cluster>@<domain_name>
.
cluster_name | current_database_user | syscontext_namespace | syscontext_parm_hadoop_user |
---|---|---|---|
[DEFAULT] | * | USERENV | AUTHENTICATED_IDENTITY |
In this example, “HRAPP” is an HR Application that always connects to the database using the HRAPP database user and then programmatically sets the application user through the DBMS_SESSION.SET_IDENTIFIER
procedure. There are number of “lightweight” users who are designated with CLIENT_IDENTIFIER
(as in sys_context('USERENV', 'CLIENT_IDENTIFIER') [DEFAULT] * USERENV GLOBAL_UID
, which is similar to running select sys_context('USERENV', 'CLIENT_IDENTIFIER') from dual; )
.
The current database has other effective users who are enterprise users with logons managed by Oracle Internet Directory for Enterprise User Security. In these cases, the GLOBAL_UID
is used for Hadoop impersonation.
cluster_name | current_database_user | syscontext_namespace | syscontext_parm_hadoop_user |
---|---|---|---|
[DEFAULT] | HRAPP | USERENV | CLIENT_IDENTIFIER |
[DEFAULT] | * | USERENV | GLOBAL_UID |
In this example, BIAPP is a business intelligence application whose own context is its username. For customers using the application, their designated ID is used for Hadoop access. In other words, when the effective current user is 'BIAPP', we use sys_context('BIVPD','USERID')
for the impersonation. For the rest of the users, we simply designate [DEFAULT] * *
in order use their current database username for the impersonation.
cluster_name | current_database_user | syscontext_namespace | syscontext_parm_hadoop_user |
---|---|---|---|
[DEFAULT] | BIAPP | BIVPD | USERID |
[DEFAULT] | * | * |
In this example, the Oracle username SCOTT is impersonated by the hdpusr1
Hadoop user for HDFS access. The user ADAM is impersonated by hdpusr2
for HDFS access.
All other users have more limited access, so we use a syscontext_namespace
value of 'lowprivuser” to designate these users.
cluster_name | current_database_user | syscontext_namespace | syscontext_parm_hadoop_user |
---|---|---|---|
hadoop_cl_1 | SCOTT | hdpusr1 | |
hadoop_cl_1 | ADAM | lowprivuser | hdpusr2 |
hadoop_cl_1 | * |
7.4 DBMS_BDSQS PL/SQL Package
With the DBMS_BDSQS PL/SQL package you can gather external table statistics and synchronize external tables with corrsponding HIVE tables.
7.4.1 GATHER_EXTERNAL_TABLE_STATS
This is the PL/SQL interface for manually gathering statistics on Oracle Big Data SQL external tables (ORACLE_HDFS, ORACLE_HIVE, ORACLE_BIGDATA).
The behavior and parameters are identical to that of dbms_stats.gather_table_stats, with the exception that the owner of the table must be the session user running the procedure and the stats gathered using this procedure persist after a restart. This procedure cannot be used on external tables that are automatically synced from Hive. See DBMS_STATS.GATHER_TABLE_STATS Procedure.
Syntax
PROCEDURE gather_external_table_stats(
tabname varchar2,
partname varchar2 default null,
estimate_percent number default dbms_stats.DEFAULT_ESTIMATE_PERCENT,
block_sample boolean default FALSE,
method_opt varchar2 default dbms_stats.DEFAULT_METHOD_OPT,
degree number default dbms_stats.DEFAULT_DEGREE_VALUE,
granularity varchar2 default dbms_stats.DEFAULT_GRANULARITY,
cascade boolean default dbms_stats.DEFAULT_CASCADE,
stattab varchar2 default null,
statid varchar2 default null,
statown varchar2 default null,
no_invalidate boolean default dbms_stats.to_no_invalidate_type(dbms_stats.get_param('NO_INVALIDATE')),
stattype varchar2 default 'DATA',
force boolean default FALSE,
options varchar2 default dbms_stats.DEFAULT_OPTIONS
)
7.4.2 SYNC_HIVE_DATABASES
This is the PL/SQL interface for initiating a Hive synchronization operation.
Calls to this procedure perform a delta synchronization on Hive databases that have already been synced with the Query Server by a previous Cloudera or Ambari command. A delta synchronization applies changes on existing Hive databases, for instance table additions/deletions and table column additions and deletions, to the schemas and external tables already created on the Query Server. More extensive synchronization operations, such as getting a new Hive database, can only be performed by Cloudera or Ambari commands.
7.5 DBMS_BDSQS_ADMIN PL/SQL Package
This package contains procedures to manage database and Kerberos users for the Hadoop cluster.
7.5.1 ADD_DATABASE_USERS
Use this PL/SQL procedure to add database users to Query Server for secure clusters. You can then connect to Query Server using database managed users instead of Kerberos principals.
- Assignment of all the internal roles and privileges required to provide users with access to external tables.
- "Saving" of the new users so that they are not lost when the Query Server restarts.
Note:
UsingCREATE USER
to add new users is not supported on the
Query Server.
ADD_DATABASE_USER
must be
dropped using DROP_DATABASE_USER
in order to be cleanly removed. See
DROP_DATABASE_USERS.
The ADD_DATABASE_USER
procedure does not support setting of
the user passwords. Users are created using NO AUTHENTICATION and are locked. After
invoking ADD_DATABASE_USER
, an administrator needs to manually issue
ALTER
user statements to set the user password and unlock the user.
User passwords can be set normally using ALTER USER
statements and the
changed passwords are preserved after a Query Server restart.
The ADD_DATABASE_USER
procedure must be run by
SYS
or by a user with the BDSQL_ADMIN
role.
Syntax
procedure add_database_users (
users_list VARCHAR2,
op_semantics VARCHAR2 DEFAULT 'STOP_ON_FIRST_ERROR');
Parameters for ADD_DATABASE_USERS
Parameter | Description |
---|---|
users_list | The name(s) of the database users that you want to add to the Query Server. These users are the same database users that will be accessing your Hadoop cluster. To specify several database users, provide a comma-separated list of database user names enclosed in one single quote. The names must adhere to Oracle's rules for database object names exactly as adhered to by the CREATE USER statement. |
op_semantics |
The operation semantics (os) parameter can be one of
the following options. All failed additions can be reviewed in
the
|
Error Handling
sys.bdsqs_dbusers_errors$
log table. This PL/SQL procedure may
encounter two types of errors:
- Parse errors where the
users_list
cannot be processed. For example, when the list contains mismatched double quotes. Parse errors cause the entire call to fail. - Processing errors occur when executing the call. The
error behavior for processing errors is configurable using the
op_semantics
parameter.
7.5.2 ADD_KERBEROS_PRINCIPALS
This procedure adds Query Server Kerberos principals. Query Server users are selected from the same set of Kerberos principals that have access to the Hadoop cluster. The users are authenticated using Kerberos; passwords do not need to be managed in the Query Server.
Syntax
DBMS_BDSQS_ADMIN.ADD_KERBEROS_PRINCIPALS(
principals_list varchar2,
op_semantics varchar2 DEFAULT 'STOP_ON_FIRST_ERROR'
)
Table 7-10 Parameters for ADD_KERBEROS_PRINCIPALS
Parameter | Description |
---|---|
principals_list | The name(s) of the Query Server users that you want to add. These Query Servers users are the same Kerberos principals that will be accessing your Hadoop cluster. To specify several Kerberos principals, you can provide a comma-separated list of strings of the principals enclosed in one single quotes. |
op_semantics |
The operation semantics (os) parameter can be one of the following options. All failed additions can be reviewed in the sys.bdsqs_kerberos_errors$ log table.
|
Usage Notes
The schema for bdsqs_kerberos_errors$ is:
CREATE TABLE sys.bdsqs_kerberos_errors$
( kerberos_principal VARCHAR2(4000), /* principal name */
operation VARCHAR2(4), /* ADD, DROP or SYNC*/
error_code NUMBER, /* Error num emmited during operation */
error_message VARCHAR2(4000), /* Emitted error message */
additional_info VARCHAR2(4000) /* Not used yet */
);
The bdsqs_kerberos_errors$ log table will contain the exception logs for the following situations:
- When you invoke the add_kerberos_principals or the drop_kerberos_principals procedures.
- When you run the jaguar install command on a secure cluster with the sync_principals parameter set to true in the kerberos section in the bds-config.json (operation SYNC).
- When you run the jaguar sync_principals (operation SYNC).
The bdsqs_kerberos_errors$ log table is cleared before invoking any of the above commands. Querying bdsqs_kerberos_errors$ after a successful invocation of the procedures should return no results.
These are the fields in a bdsqs_kerberos_errors$ record and the possible values for field.
bdsqs_kerberos_errors$ Record Field | Possible Values |
---|---|
OPERATION | ADD, DROP, SYNC or UNDO |
ERROR_CODE | Numeric code (SQLCODE) of the raised exception |
ERROR_MESSAGE | Error message (SQLERRM) of the raised exception |
ADDITIONAL_INFO | Additional information pertinent to the raised exception. Currently not used. |
Examples
The following example creates two database users identified externally by the Kerberos principals: princ1@REALM.COM and princ2@REALM.COM. The default STOP_ON_FIRST_ERROR parameter is used.
exec dbms_bdsqs_admin.add_kerberos_principals('princ1@REALM.COM,
princ2@REALM.COM');
The following example creates two database users for the valid princ3@REALM.COM and princ4@REALM.COM Kerberos principals; however, since princ4 is not a valid Kerberos principal, it raises an exception, and adds the error log to the sys.bdsqs_kerberos_errors$ table.
exec dbms_bdsqs_admin.add_kerberos_principals('princ3@REALM.COM, princ4, princ4@REALM.COM','ABSORB_ERRORS');
The following example creates a database user for princ5@REALM.COM, and it raises an exception since princ6 is not a valid Kerberos principal name. It also adds the error log to the sys.bdsqs_kerberos_errors$ table.
exec dbms_bdsqs_admin.add_kerberos_principals('princ5@REALM.COM,
princ6','STOP_ON_FIRST_ERROR');
The following example does not create any database users because princ8 is not a valid Kerberos principal name and because the UNDO_CHANGES parameter is used. Initially, the procedure creates a database user for princ7@REALM.COM; however, it identifies princ8 as an invalid Kerberos principal name. Since the UNDO_CHANGES parameter is used, the procedure removes the database user it created for princ7@REALM.COM, raises an exception, and adds the error log to the sys.bdsqs_kerberos_errors$ table.
exec dbms_bdsqs_admin.add_kerberos_principals('princ7@REALM.COM, princ8','UNDO_CHANGES');
7.5.3 DROP_DATABASE_USERS
This is the PL/SQL interface for manually dropping database users.
Users created using ADD_DATABASE_USER
must be
dropped using DROP_DATABASE_USER
in order to be cleanly
removed. See ADD_DATABASE_USERS.
Syntax
procedure drop_database_users(
users_list VARCHAR2,
op_semantics VARCHAR2 DEFAULT 'STOP_ON_FIRST_ERROR'
);
Parameters for DROP_DATABASE_USERS
Parameter | Description |
---|---|
users_list | The name(s) of the database users that you want to drop from the Query Server. These users are the same database users that have access to your Hadoop cluster. To specify several database users, provide a comma-separated list of database user names enclosed in one single quote. |
op_semantics |
The operation semantics (os)
parameter can be one of the following options. All
failed additions can be reviewed in the
|
Error Handling
sys.bdsqs_dbusers_errors$
log table. This
PL/SQL procedure may encounter two types of errors:
- Parse errors where the
users_list
cannot be processed. For example, when the list contains mismatched double quotes. Parse errors cause the entire call to fail. - Processing errors occur when
executing the call. The error behavior for
processing errors is configurable using the
op_semantics
parameter.
7.5.4 DROP_KERBEROS_PRINCIPALS
This procedure drops Query Server users. Query Server users are selected from the same set of Kerberos principals that have access to the Hadoop cluster. Errors in the procedure are logged to bdsqs_kerberos_errors$.
Syntax
DBMS_BDSQS_ADMIN.DROP_KERBEROS_PRINCIPALS(
principals_list varchar2,
op_semantics varchar2 DEFAULT 'STOP_ON_FIRST_ERROR'
)
Parameters for DROP_KERBEROS_PRINCIPALS
Parameter | Description |
---|---|
principals_list | The name(s) of the Query Server users that you want to drop. These Kerberos principals will no longer have access to Query Server. To specify several Kerberos principals, you can provide a comma-separated list of strings of the principals enclosed in one single quotes. |
op_semantics |
The operation semantics (os) parameter can be one of the following options. All failed drops can be reviewed in the sys.bdsqs_kerberos_errors$ log table.
|
Usage Notes
See ADD_KERBEROS_PRINCIPALS for usage notes details.
7.6 DBMS_HADOOP PL/SQL Package
The DBMS_HADOOP
package contains a function to generate the CREATE EXTERNAL TABLE
DDL for a Hive table:
7.6.1 CREATE_EXTDDL_FOR_HIVE
This function returns a SQL CREATE TABLE ORGANIZATION EXTERNAL
statement for a Hive table. It uses the ORACLE_HIVE
access driver.
Syntax
DBMS_HADOOP.CREATE_EXTDDL_FOR_HIVE ( cluster_id IN VARCHAR2, db_name IN VARCHAR2 := NULL, hive_table_name IN VARCHAR2, hive_partition IN BOOLEAN, table_name IN VARCHAR2 := NULL, perform_ddl IN BOOLEAN DEFAULT FALSE, text_of_ddl OUT VARCHAR2 );
Parameters
Table 7-11 CREATE_EXTDDL_FOR_HIVE Function Parameters
Parameter | Description |
---|---|
|
Hadoop cluster where the Hive metastore is located |
|
Name of the Hive database |
|
Name of the Hive table |
|
Whether the table is partitioned ( |
|
Name of the Oracle external table to be created. It cannot already exist. |
|
Whether to execute the generated Do not execute the command automatically if you want to review or modify it. |
|
The generated |
Usage Notes
The Oracle Database system must be configured for Oracle Big Data SQL. See "About Oracle Big Data SQL on the Database Server (Oracle Exadata Machine or Other)".
The data type conversions are based on the default mappings between Hive data types and Oracle data types. See "Hive to Oracle Data Type Conversions".
7.6.1.1 Example
The following query returns the CREATE EXTERNAL TABLE
DDL for my_hive_table
from the default Hive database. The connection to Hive is established using the configuration files in the ORACLE_BIGDATA_CONFIG
directory, which identify the location of the HADOOP1 cluster.
DECLARE DDLtxt VARCHAR2(4000); BEGIN dbms_hadoop.create_extddl_for_hive( CLUSTER_ID=>'hadoop1', DB_NAME=>'default', HIVE_TABLE_NAME=>'my_hive_table', HIVE_PARTITION=>FALSE, TABLE_NAME=>'my_xt_oracle', PERFORM_DDL=>FALSE, TEXT_OF_DDL=>DDLtxt ); dbms_output.put_line(DDLtxt); END; /
The query returns the text of the following SQL command:
CREATE TABLE my_xt_oracle ( c0 VARCHAR2(4000), c1 VARCHAR2(4000), c2 VARCHAR2(4000), c3 VARCHAR2(4000)) ORGANIZATION EXTERNAL (TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS ( com.oracle.bigdata.cluster=hadoop1 com.oracle.bigdata.tablename=default.my_hive_table ) ) PARALLEL 2 REJECT LIMIT UNLIMITED