MySQL AI User Guide
The
NL2ML
routine enables you to learn about MySQL AI AutoML by providing
relevant citations to MySQL AI documentation. You can also leverage
the
ML_GENERATE
routine or an MCP server with external LLMs to generate AutoML
queries you can copy and run.
This topic has the following sections.
To use this feature, you must load the appropriate version
of MySQL AI documentation to the folder defined by
secure_file_priv. See
Load MySQL AI Documentation.
Review the required tasks to Create a Machine Learning Model.
Before using this feature, you must load the appropriate version
of the MySQL AI documentation into the MySQL AI directory defined by
secure_file_priv.
To load the documentation:
In the top-right corner of this page, make sure that the correct verion of MySQL AI is selected.
In the bottom-left corner of this page, click the link to download the PDF version of the documentation.
Rename the downloaded PDF file to
mysql_ai_en.pdf.
Log into your MySQL AI instance and upload the PDF file to the
MySQL AI directory defined by
secure_file_priv. Ensure
that the file has the appropriate read access for all users,
so that MySQL AI can read the file.
If you do not know the appropriate directory, you can run the following command:
mysql> SELECT @@secure_file_privSee LOAD DATA Statement.
To use
NL2ML
to provide citations to MySQL HeatWave documentation, and then leverage
in-database LLMs to generate responses that include appropriate
table schemas and commands, do the following:
Set the skip_generate option to
true with the
@nl2ml_options session variable.
Use
ML_RETRIEVE_SCHEMA_METADATA
to retrieve the table schema related to the question asked
during the
NL2ML
routine.
Use GROUP_CONCAT() to build a
compact context string from the citations provided by
NL2ML.
Use
ML_GENERATE
to specify the in-database LLM and generate the response to
the question, which includes the citations, context, and
retrieved table schema.
See the following example.
To use in-database LLMs with
NL2ML:
Specify the question and set it into a variable
(@input).
mysql> SET @input = "How can I train a model to predict net worth of a singer?";
Set the skip_generate option to
true with the
@nl2ml_options session variable.
mysql> SET @nl2ml_options = JSON_OBJECT("skip_generate", true);
Run the
NL2ML
routine and include the previous variable that has the
question.
mysql> CALL sys.NL2ML(@input, @out);
View the output generated from the question by selecting the
@out variable.
mysql> SELECT JSON_PRETTY(@out);
JSON_PRETTY(@out)
{
"citations": [
{
"segment": "<segment content>",
"distance": 0.1023,
"document_name": <mysql_ai_en.pdf>,
"segment_number": <segment number>
},
...
],
"retrieval_info": {
"method": "n_citations",
"threshold": 0.114
}
}The output includes citations with the following information:
segment: The relevant excerpts from
the MySQL AI documentation.
distance: A value indicating how
relevant the segment is to the question asked. A lower
value represents a more relevant segment.
document_name: A reference to the
MySQL AI documentation.
segment_number: The index number
identifying the segment.
Use
ML_RETRIEVE_SCHEMA_METADATA
to retrieve the most relevant table schema for the previous
question.
mysql> CALL sys.ML_RETRIEVE_SCHEMA_METADATA(@input, @retrieved, NULL);
Retrieve the table schema from the
@retrieved variable.
mysql> SELECT @retrieved;
@retrieved
CREATE TABLE `mlcorpus`.`singer`(
`Singer_ID` int,
`Name` varchar,
`Birth_Year` double,
`Net_Worth_Millions` double,
`Citizenship` varchar
);
Use GROUP_CONCAT() to build a
compact context string from the citations provided by
NL2ML.
mysql> SELECT GROUP_CONCAT(seg SEPARATOR '\n') INTO @ctx
FROM JSON_TABLE(JSON_EXTRACT(@out,'$.citations'),
'$[*]' COLUMNS (seg LONGTEXT PATH '$.segment')) AS jt;
Combine the retrieved table schema and citations as the final context.
mysql> SET @final_ctx = CONCAT(@ctx, '\n\nRetrieved tables:\n', @retrieved);
Use
ML_GENERATE
to specify an in-database LLM
(llama3.2-3b-instruct-v1) and manually
create a SQL statement that includes the citations, context,
and retrieved table schema.
mysql> SELECT sys.ML_GENERATE(
@input,
JSON_OBJECT(
"task", "generation",
"model_id", "llama3.2-3b-instruct-v1",
"context", @final_ctx
)
) INTO @result;
Generate the output and SQL text.
mysql> SELECT JSON_UNQUOTE(JSON_EXTRACT(@result,'$.text')) AS generated_sql;
generated_sql
To train a model to predict the net worth of a singer, you can use the ML_TRAIN routine. First, prepare your dataset,
which in this case seems to be the 'singer' table in the 'mlcorpus' schema. Ensure that the table has the necessary columns,
such as 'Singer_ID', 'Name', 'Birth_Year', 'Net_Worth_Millions', and 'Citizenship'.
The 'Net_Worth_Millions' column will be your target column for prediction. You may need to preprocess your data,
for example, converting categorical variables like 'Name' and 'Citizenship' into numerical variables if necessary.
Then, you can call the ML_TRAIN routine with the appropriate options. For a regression task like predicting net worth,
you would specify the task as 'regression' in the JSON options. Here's a simplified example:
```sql
CALL sys.ML_TRAIN('mlcorpus.singer',
@model_handle,
'Net_Worth_Millions',
JSON_OBJECT('task', 'regression',
'algorithm', 'XGBRegressor'));
```
Replace '@model_handle' with your actual model handle variable. This will train a model to predict the 'Net_Worth_Millions'
based on the other columns in your 'singer' table. After training, you can use the ML_PREDICT_ROW or ML_PREDICT_TABLE routine
to generate predictions for new, unseen data.Review Machine Learning Use Cases.
Review the syntax and examples for the
NL2ML
routine.