MySQL HeatWave User Guide
External tables use
table-level
and
column-level
ENGINE_ATTRIBUTE options of the
CREATE
TABLE statement to specify the parameters
needed to process data stored externally in Object Storage.
The column-level ENGINE_ATTRIBUTE overrides
specific dialect parameters, if required.
This topic contains the following sections:
As of MySQL 9.4.0, you can use the CREATE EXTERNAL
TABLE statement to specify
ENGINE_ATTRIBUTE options.
CREATE EXTERNAL TABLE table_name (col_name data_type [ENGINE_ATTRIBUTE='Col_Engine_Attribute_JSON'], ...) ENGINE_ATTRIBUTE='Tbl_Engine_Attribute_JSON';
If you are on a version earlier than MySQL 9.4.0, you must
use the CREATE TABLE
statement to specify ENGINE_ATTRIBUTE
options. You must also manually set
ENGINE to lakehouse,
and SECONDARY_ENGINE to
rapid.
CREATE TABLE table_name (col_name data_type [ENGINE_ATTRIBUTE='Col_Engine_Attribute_JSON'], ...) ENGINE=lakehouse SECONDARY_ENGINE=rapid ENGINE_ATTRIBUTE='Tbl_Engine_Attribute_JSON';
After creating a table, you can load data into the table in the following ways:
Load
the data automatically with Lakehouse Auto Parallel Load by using the
HEATWAVE_LOAD command. See how to
Load External Data Using Lakehouse Auto Parallel Load with an Existing Table.
Load
the data manually with the
ALTER TABLE statement and
the option to use Guided Load.
You can also use ENGINE_ATTRIBUTE options
with the HEATWAVE_LOAD command to load
data using Lakehouse Auto Parallel Load. See
HEATWAVE_LOAD
and Load
Structured Data Using Lakehouse Auto Parallel Load.
The external table ENGINE_ATTRIBUTE is a
JSON object that consists of the dialect
parameters and the file parameters. They are represented in
the following ways:
file_JSON_array is a
JSON array containing the
JSON objects representing the file
location details and file parameters. You can use the
following methods to specify these details:
Uniform Resource Identifiers (URI)
Available as of MySQL 9.3.1 for MySQL HeatWave on OCI. You can provide an OCIFS URI, PAR URI, or Native URI.
See Resource Principals in Oracle Cloud Infrastructure Documentation.
Pre-Authenticated Requests (PAR)
See Object Storage Pre-Authenticated Requests in Oracle Cloud Infrastructure Documentation.
Refer to File Options.
dialect_JSON_object is a
JSON object containing the
dialect
parameters to specify the data format in the
files.
Tbl_Engine_Attribute_JSON: {begin-object"file":file_JSON_array[, "dialect":dialect_JSON_object] [, "auto_refresh_event_source": {"stream_OCID"} [, "match_columns_by": "order" | "name_case_sensitive" | "name_case_insensitive"]end-object}file_JSON_array: {begin-arrayfile_def[,file_def] …end-array}file_def:file_RP_JSON|file_PAR_JSON|file_URI_JSONfile_def_RP_JSON: { "bucket": "bucket_name", "namespace": "namespace_name", "region": "region_name", {"prefix": "prefix_value" | "name": "filename" | "pattern" : "pattern_value"}, "is_strict_mode": {true | false}, "allow_missing_files": {true | false} }file_PAR_JSON: { "par": "PAR URL", ["prefix": "prefix_value", | "name": "filename", | "pattern" : "pattern_value",] "is_strict_mode": {true | false}, "allow_missing_files": {true | false} }file_URI_JSON: { ("uri": "OCIFS_URI") | ("uri": "PAR_URI") | ("uri": "NATIVE_URI"), "is_strict_mode": {true | false}, "allow_missing_files": {true | false} }dialect_JSON_object:begin-objectdialect_option_key_value[,dialect_option_key_value]...end-objectdialect_option_key_value: { "format": {"avro" | "csv" | "json" | "parquet" | "delta"} | "check_constraints": {true | false} | "field_delimiter": {"|" | "," | "\\t" | ";" | "auto" |"custom_field_delimiter"} | "record_delimiter": {"\\r" | "\\n" | "\\r\\n" | "auto" |"custom_record_delimiter"} | "escape_character": {"\\\\" | "" |"custom_escape_character"} | "quotation_marks": {"\\"" | "" |"custom_quotation_marks"} | "encoding": "utf8mb4" | "date_format": "custom_date_format" | "time_format": "custom_time_format" | "timestamp_format": "custom_timestamp_format" | "trim_spaces": {true | false} | "skip_rows":Integer_20| "has_header": {true | false} | "is_strict_mode": {true | false} | "compression": {"zip" | "gzip" | "bzip2" | "auto"} | "embed_model_id": {"MySQL HeatWave_in_database_embedding_model"} }Integer_20: Represents integer value between 0 and 20.
When you specify ENGINE_ATTRIBUTE
parameters in JSON format, you must
escape certain characters using \
(backslash). For example, to specify
linefeed as a record delimiter, you need to specify it
as "\\n" in the
ENGINE_ATTRIBUTE parameter’s
JSON value.
In the syntax description, square brackets
("[" and "]")
indicate optional words or clauses.
The syntax description uses a vertical bar
("|") to separate alternatives. The
list of alternatives are enclosed in square brackets
"[" and "]")
when you must choose one member. The list of
alternatives are enclosed in curly braces
"{" and "}")
when you must choose one member.
The terms begin-array and
end-array in the syntax
represent a JSON array. While
writing the sample, replace them with
[].
The terms begin-object and
end-object in the syntax
represent a JSON object. While
writing the sample, replace them with
{}.
The following examples create tables with the
CREATE EXTERNAL TABLE statement. If
you are on a version earlier than MySQL 9.4.0, you
must use CREATE TABLE and set
ENGINE to
lakehouse, and
SECONDARY_ENGINE to
rapid. See
CREATE TABLE Statement.
For versions earlier than MySQL 9.5.0, you can override some
options at the column level by specifying column-level
parameters. For MySQL 9.5.0 and later, you can do this by
loading
data manually with Guided Load disabled. You can set the
date_format,
time_format,
timestamp_format, and
model as optional column-level
ENGINE_ATTRIBUTE parameters . For more
information, refer to
dialect
parameters.
As of MySQL 9.5.0, Lakehouse Auto Parallel Load and Guided Load can infer supported
formats for temporal data types by setting the appropriate
dialect-level temporal formats to auto.
The auto setting is the default value if
no temporal format is specified. If you set a specific
format at the column level for a temporal data type when
creating a table, and a different format is inferred when
loading the table to Lakehouse, the specific format used
for the column when creating the table is prioritized and
used for loading the column. See
About Lakehouse Auto Parallel Load Schema Inference to
learn more.
Col_Engine_Attribute_JSON:begin-objectcolumn_option_key_value[,column_option_key_value]...end-objectcolumn_option_key_value: { "date_format": "custom_date_format" | "time_format": "custom_time_format" | "timestamp_format": "custom_timestamp_format" | "model": {MySQL HeatWave_in_database_embedding_model} }
model: Supported as of MySQL 9.3.2.
Lets you specify the column-level embedding model for
VECTOR columns. You must
specify a model
supported
by MySQL HeatWave GenAI. If the vector uses another
embedding model, you cannot use it with GenAI
features. You can include an unsupported embedding model
in the column or table comments for your own reference.
Learn more about the
Vector Data Type.
The following example sets the embedding model for the
column to multilingual-e5-small.
mysql> CREATE EXTERNAL TABLE table_1(col_1 VECTOR(5) ENGINE_ATTRIBUTE '{"model": "multilingual-e5-small"}')
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
The following example sets the
date_format,
time_format, and
timestamp_format at the column level.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 date ENGINE_ATTRIBUTE '{"date_format": "%d/%m/%Y"}',
col_2 time ENGINE_ATTRIBUTE '{"time_format": "%I:%i %p"}',
col_3 timestamp ENGINE_ATTRIBUTE '{"timestamp_format": "%Y-%m-%d %H:%i:%s"}',
col_4 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
The following example sets the
date_format,
time_format, and
timestamp_format options to
auto, which allows Lakehouse Auto Parallel Load and Guided Load to
infer supported formats for temporal data types.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 date ENGINE_ATTRIBUTE '{"date_format": "auto"}',
col_2 time ENGINE_ATTRIBUTE '{"time_format": "auto"}',
col_3 timestamp ENGINE_ATTRIBUTE '{"timestamp_format": "auto"}',
col_4 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';For more information about other column definitions, see Dialect Parameters for CSV Files.
The following parameters are required to create an external table:
ENGINE: Set to
lakehouse if you are on a version
earlier than MySQL 9.4.0. For MySQL 9.4.0 and later, you
can use the CREATE EXTERNAL TABLE
statement, which automatically sets
ENGINE to
lakehouse.
SECONDARY_ENGINE: Set to
rapid if you are on a version earlier
than MySQL 9.4.0. For MySQL 9.4.0 and later, you can use
the CREATE EXTERNAL TABLE statement,
which automatically sets
SECONDARY_ENGINE to
rapid.
ENGINE_ATTRIBUTE: JSON object
literal. Specifies the location of files, the file
format, and how the file format is handled.
Use key-value pairs in JSON format to
specify options. Lakehouse
uses the default setting if there is no defined option.
Use NULL to specify no arguments.
file: Defines the Object Storage
files. You can define the file locations using resource
principals, uniform resource identifier (URI) (as of
MySQL 9.3.1 for MySQL HeatWave on OCI), or PAR URL.
As of MySQL 9.4.1, Lakehouse supports a maximum of 164
file locations. As of MySQL 9.3.1, Lakehouse supports
a maximum of 170 file locations. Before MySQL 9.3.1,
Lakehouse supports a maximum of 256 file locations. To
define more than the maximum number of files, store the
files under the same bucket or use
prefix or pattern.
file parameters for uniform
resource identifiers (As of MySQL 9.3.1 for
MySQL HeatWave on OCI):
uri: The URL depending on the
type of URI you define. Do not provide a
pattern, prefix, or name parameter with the
uri. These details are
inferred by the object_path
you define.
The following examples use the URI syntax for OCIFS:
oci://bucket_name@namespace_name/object_path
Name: If the object_path
is neither a glob pattern nor prefix.
The following example specifies a single file by using an OCIFS URI.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
Prefix: If the
object_path is not a glob
pattern and ends with an unencoded
/ character, such as a
folder path.
The following example uses a prefix with an OCIFS URI. The prefix specifies files in an Object Storage folder.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/"}]}';
Glob pattern: If the
object_path contains at
least one of the following unencoded
characters: ?,
*, or
[. To use these
characters as literals, you need to escape
them or encode them as needed depending on
the URI syntax. Regex patterns are not
supported. See
Glob
Patterns from the Oracle Cloud Infrastructure
Documentation to learn more.
The following example uses an unencoded
* character to create a
glob pattern.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_files/data_file_*.csv"}]}';
Review additional examples that use URIs to create external tables using Auto Parallel Load or manually.
file parameters for
resource
principals:
bucket_name: Buckets allow
you to storage objects in a compartment. To
learn more about buckets, see
Object
Storage Buckets in Oracle
Cloud Infrastructure Documentation.
namespace_name: This is the
top-level container for all buckets and objects.
To learn how to view your namespace, see
Object
Storage Namespaces in Oracle
Cloud Infrastructure Documentation.
region_name: You need to
define the region identifier for your OCI
tenancy. To view the list of available regions
and region identifiers, see
Regions and Availability Domains in
Oracle Cloud Infrastructure
Documentation.
As of MySQL 9.2.1, if the region is not specified, the instance's associated region is used as the default region. Otherwise, the specified region is used.
The following example specifies a single file by using resource principals.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"region": "us-ashburn-1",
"namespace": "tenant_1",
"bucket": "bucket_1",
"name": "data_file_1.csv"}]}';
Review additional examples that use resource principals to create external tables using Lakehouse Auto Parallel Load or manually.
file parameters for
pre-authenticated
requests:
par: The PAR URL.
Do not specify a region,
namespace or
bucket with
par. These values are
included in the PAR URL and generate an error if
defined as separate parameters. For more
information, refer to
PAR Recommendations.
The following example specifies a single file with a PAR.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenancy_1/b/bucket_1/o/data_file_1.csv"}]}';
Review additional examples that use PARs to create external tables using Lakehouse Auto Parallel Load or manually.
The following file parameters apply
only to PAR and resource principals. Use one of the
parameters, unless the target defines a specific file.
name: A specific Object Storage
file name. For
pre-authenticated
requests, the file name provided in the PAR
URL is considered.
The following example uses a PAR and the
name parameter to specify a file
in an Object Storage bucket. The PAR points to the
Object Storage bucket.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"par": "https://objectstorage.us-ashburn-1.oraclecloud.com/p/.../n/tenancy_1/b/bucket_1/o/",
"name": "data_files/data_file_1.csv"}]}';
pattern: A regular expression
that defines a set of Object Storage files.
The following example uses resource principals and a pattern to load files that have a numerical suffix.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"region": "us-ashburn-1",
"namespace": "tenant_1",
"bucket": "bucket_1",
"pattern": "data_files/data_file_\\\\d+\\\\.csv"}]}';
See the following to learn more:
prefix: The prefix for a set of
Object Storage files. For
pre-authenticated
requests, prefix or bucket name present in
the PAR URL is considered.
The following example uses resource principals and a prefix to specify files to load in an Object Storage folder.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"region": "us-ashburn-1",
"namespace": "tenant_1",
"bucket": "bucket_1",
"prefix": "data_files/"}]}';Review examples that use PARs to create external tables using Lakehouse Auto Parallel Load or manually.
Review examples that use resource principals to create external tables using Lakehouse Auto Parallel Load or manually.
The following file parameters are optional:
compression: Supported as of MySQL
9.3.2. Use this to load compressed files.
Compression is supported for all file formats:
CSV: The default value is no
compression. You can set
compression to
zip, gzip, or
bzip2.
The following example specifies a CSV file
compressed in zip format.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "compression": "zip"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv.zip"}]}';
JSON: The default value is no
compression. You can set
compression to
zip, gzip, or
bzip2.
The following example specifies a JSON file
compressed in gzip format.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "json", "compression": "gzip"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.json.gz"}]}';
Avro: The default value is
auto. You cannot set any other
options for compression, as Avro
files declare their compression format in the file
metadata.
The following example specifies an Avro file with
the auto compression option.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "avro", "compression": "auto"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.avro"}]}';
Parquet: The default value is
auto. You cannot set any other
options for compression, as
Parquet files declare their compression format in
the file metadata.
The following example specifies a Parquet file with
the auto compression option.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "parquet", "compression": "auto"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.parquet"}]}';Note the following when loading compressed files:
Each compressed file must only contain a single original file. You cannot provide a compressed file with archives or folders with multiple files.
Files in zip format must be
compressed with the Deflate algorithm.
If loading multiple files, all the files must be in the same compression format.
When loading multiple compressed files, it is best to prepare files of similar size, and in the range of 80-120 MB each. Otherwise, you may experience longer loading times.
is_strict_mode: Loads data in strict
mode or non-strict mode, based on the selected option.
This overrides the is_strict_mode
dialect parameter. To learn about the default value, see
the is_strict_mode dialect parameter
in
Common Dialect Parameters.
The supported options are:
true: Loads the data in strict
mode. The data loading stops if there is an error
due to missing files, empty columns, formatting
errors or parsing errors.
false: Loads the data in
non-strict mode. Missing files, empty columns,
formatting errors, or parsing errors display a
warning, and data is loaded. If there are missing
columns, implicit default values are used in the
absence of user-provided default values. See
Data Type Default Values to learn more.
The following example sets
is_strict_mode to
false in the file
parameters.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": "format": "csv"},
"file": [{"oci://mybucket@mynamespace/data_files/data_file_1.csv",
"is_strict_mode": false}]}';
allow_missing_files: Supported as of
MySQL 8.4.0. Handles missing files based on the selected
option. This overrides the dialect parameter
is_strict_mode for missing files. To
learn about the default value, see the
allow_missing_files dialect parameter
in
Common Dialect Parameters.
The supported options are:
true: If any file is missing, no
error occurs and data loading continues with the
existing files. An error occurs if all the files are
not available.
false: If any file is missing, an
error occurs and data is not loaded.
The following example uses a glob pattern specified with
a uniform resource identifier (URI) and sets
allow_missing_files to
true in the file
parameters. If any of the specified files in the pattern
are not available, an error does not occur and data
continues loading for the existing files. If all files
are missing, an error occurs.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_[1-10].csv",
"allow_missing_files": true}]}';
auto_refresh_event_source: Supported
as of MySQL 9.4.1. Specifies a Stream OCID for
event-based incremental load. This string value is
verified as a valid Stream resource identifier. Learn
more about
Resource Identifiers from Oracle Cloud Infrastructure
Documentation. Learn how to
Refresh Data Using Event-Based Incremental Load.
The following example specifies a Stream OCID to set up event-based incremental load.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"auto_refresh_event_source": "ocid1.stream.oc1.iad...",
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
match_columns_by: Supported as of
MySQL 9.5.0. Allows you to map columns from Parquet
files to external tables based on column order, column
name using case sensitivity, or column name using case
insensitivity.
match_columns_by has the following
requirements and specifications:
The external file must be in
parquet format.
If there is a difference between the number of
columns in the external table and the Parquet file,
you must set is_strict_mode to
false (non-strict mode).
If the external file has more columns than the Parquet file, implicit default values are used for the missing columns in the absence of user-provided default values. See Data Type Default Values to learn more.
If the Parquet file has columns with the same name
but different upppercase and lowercase charcters
(for example, col_1 and
Col_1), Auto Parallel Load and Guided Load generate
errors for duplicate column names even if you use
the name_case_sensitive option.
If you create a new table with Auto Parallel Load and use the
name_case_insensitive option,
column names in the external table are created in
all lowercase characters.
match_columns_by has the following
options:
order: The external table schema
maps to the Parquet file schema by column order. For
example, the first column in the table schema maps
to the first column in the file schema, and so on.
This is the default value.
name_case_sensitive: The external
table schema maps to the Parquet file by column name
using case sensitivity.
name_case_insensitive: The
external table schema maps to the Parquet file by
column name using case insensitivity.
The following example loads columns from a Parquet file based on name using case sensitivity. Lakehouse maps columns in the Parquet file to the columns in the external table (col_1, col_2, col_3) with the same name and case sensitivity.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "parquet"},
"match_columns_by": "name_case_sensitive",
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.parquet"]}';The dialect parameters are optional and are used to define the data configuration.
The following dialect parameters are applicable to all file
formats (CSV, JSON,
Parquet and Avro).
format: The file format defined in a
table. You can define only one file format per table.
The supported file formats are:
csv: The default file format.
The following example specifies a CSV file.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
json: Supported as of MySQL
8.4.0. Only Newline Delimited JSON files are
supported. Tables created with
json format must only have a
single column that conforms to the
JSON data type.
The following example specifies a JSON file.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "json"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.json"}]}';
parquet: The
Parquet
data type.
The following example specifies a Parquet file.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "parquet"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.parquet"}]}';
avro: The
Avro
data type.
The following example specifies an Avro file.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "avro"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.avro"}]}';
delta: Supported as of MySQL
9.5.0. Loads delta lake tables. See
Delta
Lake Tables and
Limitations
for Delta Lake Tables to learn more.
The following example specifies a delta lake table.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "delta"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/"}]}';
check_constraints: Supported as of
MySQL 8.4.0. Lakehouse validates primary key and
unique key constraints during the initial load based on
the selected option. The supported options are:
true: The default value.
Lakehouse validates primary key and unique key
constraints only during the initial load of the
table. If there are subsequent loads or refreshes of
the table, validation does not occur.
false: Lakehouse does not
validate primary key and unique key constraints.
The following example sets
check_constraints to
false.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "check_constraints": false},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
is_strict_mode: Loads data in strict
mode or non-strict mode, based on the selected option.
By default, is_strict_mode is set to
the sql_mode value. See
Strict SQL Mode. The
file common parameter
is_strict_mode can override this
setting. The supported options are:
true: Loads the data in strict
mode. The data loading stops if there is an error
due to missing files, empty columns, formatting
errors or parsing errors.
As of MySQL 8.4.0, the dialect parameter
is_strict_mode applies to all
file formats. Before MySQL 8.4.0, it only applies to
the CSV file format. For
Avro and
Parquet file formats, use the
file parameter
is_strict_mode to define strict
mode before MySQL 8.4.0.
false: Loads the data in
non-strict mode. Missing files, empty columns,
formatting errors, or parsing errors display a
warning, and data is loaded. If there are missing
columns, implicit default values are used in the
absence of user-provided default values. See
Data Type Default Values to learn more.
If no value is set, the setting is defined by the
sql_mode.
Review Table 4.7, “Combinations of Settings for allow_missing_files and is_strict_mode”.
This setting overrides the global
sql_mode setting for
handling missing files. The default is the value of
sql_mode. See
Strict SQL Mode. The
file common parameter
is_strict_mode can override this
setting.
The following example sets
is_strict_mode to
false in the
dialect parameters.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "is_strict_mode": false},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
allow_missing_files: Supported as of
MySQL 8.4.0. Handles missing files based on the selected
option. This overrides the dialect parameter
is_strict_mode for missing files. The
supported options are:
true: An error does not occur for
any missing file, and data loading continues with
the existing files. An error occurs if all files are
not available.
false: If any file is missing, an
error occurs and data is not loaded.
If no value is set, the setting is defined by
is_strict_mode. If no value is
set for is_strict_mode, the
setting is defined by the
sql_mode.
Review Table 4.7, “Combinations of Settings for allow_missing_files and is_strict_mode”.
A missing file is defined as:
With the name parameter: There is
no file with that name.
With the pattern parameter: There
are no files that match the pattern.
With the prefix parameter: There
are no files with that prefix.
The following example uses a glob pattern specified with
a uniform resource identifier (URI). It sets
allow_missing_files to
true in the
dialect parameters. If any of the
specified files in the pattern are not available, an
error does not occur and data continues loading for the
existing files. If all files are missing, an error
occurs. If a file-specific setting is set, this may
override the allow_missing_files
setting.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "allow_missing_files": true},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_[1-10].csv"}]}';
Refer to the following table to review how different
combinations of settings for
allow_missing_files and
is_strict_mode affect errors for missing
files.
Table 4.7 Combinations of Settings for allow_missing_files and is_strict_mode
| allow_missing_files | is_strict_mode | Possible Errors |
|---|---|---|
| T | T | No error generated unless all files are missing. |
| T | F | |
| T | Not set | |
| Not set | F | |
| F | T | Error generated for any missing files. |
| F | F | |
| F | Not set | |
| Not set | T | |
| Not set | Not set | Error generation depends on setting for
sql_mode. |
The following dialect parameters applicable to
CSV and JSON file
formats.
If these parameters are used with Avro
or Parquet file formats, an error
occurs.
encoding: Defines the character
encoding. The default is "utf8mb4".
record_delimiter: Specifies one or
more characters used to delimit records. The maximum
record delimiter length is 64 characters. You can add
any string as a
custom_record_delimiter.
The default record delimiter for json
is "\n". The only alternative for
json is "\r\n".
For CSV files, if you set the
record_delimiter to
auto, Auto Parallel Load automatically detects the
record delimiters and composite record delimiters with
field delimiters as prefixes. Supported as of MySQL
9.2.0 and set as the default value.
The following record delimiters detected automatically by Auto Parallel Load:
\r: Carriage return.
\n: Line feed. This is the
default for all MySQL versions prior to 9.2.0.
\r\n : Carriage return and line
feed
The following dialect parameters are applicable to
CSV and Parquet file
formats.
embed_model_id: Supported as of MySQL
9.3.2. Lets you specify the embedding model for
VECTOR columns in the
table. You must specify a model
supported
by MySQL HeatWave GenAI. To specify an embedding model for a
table, you must use
Auto Parallel Load
to automatically create the table and load the data. See
the example below. If you want to use a CREATE
EXTERNAL TABLE statement to create the table
manually and specify an embedding model, provide the
embedding model as a
column
definition. Learn more about the
Vector Data Type.
The following example uses Auto Parallel Load to set the embedding
model for all VECTOR
columns detected in the table to
minilm. Keep in mind that if the
table already exists and there are
VECTOR columns with an
embedding model already set, the existing model is not
replaced with the one specified in this example.
mysql> SET @input_list = '[{
"db_name": "data_db",
"tables": [{
"table_name": "table_1",
"engine_attribute": {
"dialect": {"format": "csv", "embed_model_id": "minilm"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]
}
}]
}]';
The following dialect parameters are applicable only to
CSV file formats.
field_delimiter: Specifies one or
more characters used to enclose fields. The maximum
field delimiter length is 64 characters. You can add any
string as a
custom_field_delimiter.
When the field_delimiter is set to
auto, Auto Parallel Load automatically detects the
field delimiters. Supported as of MySQL 9.2.0 and set as
the default value.
The following field delimiters are detected automatically by Auto Parallel Load:
|: Pipe. Default for all MySQL
versions prior to 9.2.0.
,: Comma
\t: Tab
;: Semicolon
The following example sets the
record_delimiter, and
field_delimiter. The backslash
character, \, needs to be escaped
twice since it is inside an JSON object.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv",
"record_delimiter": ";",
"field_delimiter": "\\r\\n"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
escape_character: Specifies one or
more characters used to escape special characters. The
default is "\\". You can add any
string as a
custom_escape_character. As
of MySQL 9.5.0, you have the option to ignore the escape
character by setting its value to an empty string.
quotation_marks: Specifies one or
more characters used to enclose fields. The default is
"\'". You can add any string as
custom_quotation_marks. As of
MySQL 9.5.0, you have the option to ignore the quote
character by setting its value to an empty string.
Consider the following example for setting empty strings
for escape and quotation characters. You have the
following data in a CSV file:
"{""doc1"":{\""doc2\"":22}}".
You create an external table and set an empty string
for the escape character: "dialect":
{"escape_character": ""}.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv",
"escape_character": ""},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
The value loads as:
{"doc1":{\"doc2\":22}}.
You create an external table and set an empty string
for the quotation character: "dialect":
{"quotation_marks": ""}.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv",
"quotation_marks": ""},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
The value loads as:
"{""doc1"":{""doc2"":22}}".
You create an external table and set empty strings
for both the escape character and quotation
character: "dialect": {"quotation_marks":
"", "escape_character": ""}.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv",
"escape_character": "",
"quotation_marks": ""},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
The value loads as:
"{""doc1"":{\""doc2\"":22}}".
The following example sets the
escape_character, and
quotation_marks. The backslash
character, \, needs to be escaped
twice since it is inside an JSON object.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv",
"escape_character": "\\\\",
"quotation_marks": "\\""},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
date_format: The date format. See
date_format.
You can also set date formats for each column by using
the column ENGINE_ATTRIBUTE option,
which overrides the format in the dialect parameter. For
versions earlier than MySQL 9.5.0, this format is
ignored if you
load
data using Lakehouse Auto Parallel Load or
manually with Guided Load. As of MySQL 9.5.0, the format
specified at the dialect level is used to infer the
temporal data if you
load
data using Lakehouse Auto Parallel Load or
manually with Guided Load.
time_format: The time format. See
String and Numeric Literals in Date and Time Context.
You can also set time formats for each column by using
the column ENGINE_ATTRIBUTE option,
which overrides the format in the dialect parameter. For
versions earlier than MySQL 9.5.0, this format is
ignored if you
load
data using Lakehouse Auto Parallel Load or
manually with Guided Load. As of MySQL 9.5.0, the format
specified at the dialect level is used to infer the
temporal data if you
load
data using Lakehouse Auto Parallel Load or
manually with Guided Load.
timestamp_format: Supported as of
MySQL 9.0.1. The timestamp format. See
date_format. You can also
set timestamp formats for each column by using the
column ENGINE_ATTRIBUTE option, which
overrides the format in the dialect parameter. For
versions earlier than MySQL 9.5.0, this format is
ignored if you
load
data using Lakehouse Auto Parallel Load or
manually with Guided Load. As of MySQL 9.5.0, the format
specified at the dialect level is used to infer the
temporal data if you
load
data using Lakehouse Auto Parallel Load or
manually with Guided Load.
As of MySQL 9.5.0, Lakehouse Auto Parallel Load and Guided Load can infer supported
formats for temporal data types by setting the
appropriate dialect option to auto.
The auto setting is the default value
if no temporal format is specified. See
About Lakehouse Auto Parallel Load Schema Inference to
learn more.
The following example sets the
date_format,
time_format, and
timestamp_format.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 date,
col_2 time,
col_3 timestamp,
col_4 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv",
"date_format": "%d/%m/%Y",
"time_format": "%I:%i %p",
"timestamp_format": "%Y-%m-%d %H:%i:%s"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
The following example sets the
date_format,
time_format, and
timestamp_format options to
auto, which allows Auto Parallel Load and Guided Load to
infer supported formats for temporal data types.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 date,
col_2 time,
col_3 timestamp,
col_4 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv",
"date_format": "auto",
"time_format": "auto",
"timestamp_format": "auto"},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
trim_spaces: Removes/retains the
leading and trailing spaces, based on the set option.
The default value is false.
The following example sets
trim_spaces to
true.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "trim_spaces": true},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';
skip_rows: The number of rows to skip
at the start of the file. The default value is
0 and the maximum value is
20.
has_header: Adds a header row to the
CSV file based on the selected
option. The default is false.
If has_header and
skip_rows are both specified,
Lakehouse first skips the number of rows, and then
uses the next row as the header row.
The following example creates a table with Auto Parallel Load and
sets skip_rows and
has_header. Since both parameters are
set, the first row is skipped and the second row is set
as the header row.
mysql> CREATE EXTERNAL TABLE table_1(
col_1 int,
col_2 int,
col_3 int)
ENGINE_ATTRIBUTE='{"dialect": {"format": "csv", "skip_rows": 1, "has_header": true},
"file": [{"uri": "oci://mybucket@mynamespace/data_files/data_file_1.csv"}]}';Learn how to create external tables using Auto Parallel Load or manually.
Learn how to Refresh Data Using Event-Based Incremental Load.