MySQL HeatWave User Guide
Lakehouse Auto Parallel Load, which extends the Auto Parallel Load feature of MySQL HeatWave, facilitates the process of loading data from Object Storage into MySQL HeatWave by automating many of the steps involved, including:
Excluding schemas, tables, and columns that Auto Parallel Load cannot load.
Verifying that there is sufficient memory available for the data.
Optimizing load parallelism based on machine learning models.
Loading data into MySQL HeatWave.
Defining LAKEHOUSE as the engine for
tables that MySQL HeatWave loads.
Defining the ENGINE_ATTRIBUTE for tables
that MySQL HeatWave loads.
Lakehouse Auto Parallel Load also includes Lakehouse Incremental Load, which can refresh tables after an initial load.
To run Auto Parallel Load in normal mode, the
MySQL HeatWave Cluster must be active.
To use Auto Parallel Load, ask the admin user to grant you the following
privileges. Replace user_name and
database_name in the commands with
the appropriate user name and database name.
The PROCESS privilege on the
appropriate database in the DB System.
mysql> GRANT PROCESS ON database_name.* TO 'user_name'@'%';
The EXECUTE privilege on the
sys schema.
mysql> GRANT EXECUTE ON sys.* TO 'user_name'@'%';
The SELECT privilege.
mysql> GRANT SELECT ON performance_schema.* TO 'user_name'@'%';
Lakehouse Auto Parallel Load includes schema inference, and uses it in one of two ways:
Lakehouse Auto Parallel Load analyzes the data, infers the table structure, and
creates the database and all tables. This only requires
the name of the database, the names of each table, the
external file parameters, and then Lakehouse Auto Parallel Load generates the
CREATE DATABASE and
CREATE TABLE statements.
Lakehouse Auto Parallel Load uses header information from the external files to
define the column names. If this is not available, Lakehouse Auto Parallel Load
defines the column names sequentially:
col_1, col_2,
col_3 ...
If the tables already exist, Lakehouse Auto Parallel Load analyzes the data,
infers the table structure, and then modifies the
structure to avoid errors during data load. For example,
if a table defines a column with
TINYINT, but Lakehouse Auto Parallel Load infers
that the data requires
SMALLINT
MEDIUMINT,
INT, or
BIGINT, then Lakehouse Auto Parallel Load
modifies the structure accordingly. If the inferred data
type is incompatible with the table definition, Lakehouse Auto Parallel Load
raises an error, and specifies the column as NOT
SECONDARY. To learn more, see
Load External Data Using Lakehouse Auto Parallel Load with an Existing Table.
If you are on a version earlier than MySQL 8.4.0, refer to Lakehouse Auto Parallel Load with The external_tables Option for more information on the appropriate syntax to use.
As of MySQL 9.5.0, Lakehouse Auto Parallel Load is able to infer the format of
temporal data types. To do this, specify the appropriate
external table option (EXTERNAL_FORMAT,
DATE FORMAT, DATETIME
FORMAT, and TIME FORMAT for
SQL
syntax, and date_format,
time_format, and
timestamp_format for
JSON
syntax) and set the option to
auto. The auto
setting is the default value if no temporal format is
specified. To review examples, see
Lakehouse
External Table SQL Syntax and
External
Table JSON Syntax.
Inferring formats of temporal data types has the following requirements and specifications:
This is only available for files in CSV format.
Only the temporal formats specified at the dialect level are considered during inference. Column level formats, if any, are ignored during inference.
To review supported values for format specifiers of
all temporal formats, see
date_format. Not
including at least one supported format specifier
generates an error.
If you use an ambiguous format to specify temporal
data types (for example,
date_format as
%m:%d:%y and
time_format as
%h:%i:%s), and the data matches
both formats, the following order of precedence is
used for inferring the data type:
DATE >
TIME >
TIMESTAMP >
DATETIME.
Any extra characters in the temporal format that are
not format specifiers must match exactly with the
corresponding characters in the data for the temporal
data type to be recognized. For example, the specified
format "timestamp_format":
"abc%m:%d:%yabc%h:%i:%s" would match
"abc3:21:2022abc8:22:44", but not
"3:21:2022 8:22:44".
Any setting for the strict mode option
(STRICT_LOAD or
is_strict_mode) does not affect
inference for temporal data types. This also includes
MySQL modes such as NO_ZERO_DATE
and NO_ZERO_IN_DATE. Invalid and
zero dates are still considered as valid temporal
values during inference. The impact of the
sql_mode is
considered during the loading of tables into
Lakehouse.
If you set a specific format at the column level for a temporal data type when creating a table, and a different format is inferred when loading the table to Lakehouse, the specific format used for the column when creating the table is prioritized and used for loading the column.
Review the following table of supported formats for temporal data types. If multiple formats are valid for the inferred data, the first valid format listed in the following table is used as the inferred temporal format.
Table 4.8 Supported formats for inferred temporal data types
| Temporal Data Type | Format Pattern | Example Values |
|---|---|---|
DATE |
%Y-%m-%d |
2025-08-09 2025-08-9 2025-8-09 |
DATE |
%m/%d/%Y |
08/09/2025 08/9/2025 8/09/2025 |
DATE |
%d/%m/%Y |
09/08/2025 09/8/2025 9/08/2025 |
TIME |
%H:%i:%s.%f (24-hour format) |
22:30:05.100200 |
TIME |
%H:%i (24-hour format) |
22:30 |
TIME |
%I:%i %p (12-hour format) |
10:30 PM |
TIME |
%I:%i:%S %p (12-hour format) |
10:30:05 PM |
TIMESTAMP |
%Y-%m-%d %H:%i:%s.%f |
2025-08-20 14:55:30.123456 |
TIMESTAMP |
%Y-%m-%d %H:%i |
2025-08-20 14:55 |
TIMESTAMP |
%Y-%m-%dT%H:%i:%s.%f |
22025-08-20T14:55:01.000000 |
TIMESTAMP |
%Y-%m-%dT%H:%i |
2025-08-20T14:55 |
Review the following:
The Auto Parallel Load HEATWAVE_LOAD command
Learn how to Create External Tables Using Lakehouse Auto Parallel Load.
Learn how to Refresh Data Using Event-Based Incremental Load.