MySQL HeatWave User Guide
MySQL 9.4.0 introduces the NL2ML
(natural language to machine learning) routine, which provides
step-by-step guidelines on using MySQL HeatWave AutoML for a particular
business problem. Use this routine to ask general questions
about MySQL HeatWave AutoML and to generate AutoML queries you can copy
and run.
Before MySQL 9.6.0, NL2ML
requires Oracle Cloud Infrastructure Generative AI, which is available in
specific regions. To confirm if your region has Generative
AI, see
Regions
with Generative AI.
As of MySQL 9.6.0, if your region does not have OCI
Generative AI, or you want to bypass OCI Generative AI and
instead use in-database LLMs to generate responses, you
can set the skip_generate option to
true. See
Use NL2ML with In-Database LLMs.
NL2ML uses pretrained foundational models supported by OCI Generative AI.
Before MySQL 9.6.0, MySQL HeatWave in-database LLMs are not supported. To review available OCI Generative AI models, see Pretrained Foundational Models in Generative AI. For every major MySQL release, NL2ML creates a new vector store from the MySQL HeatWave User Guide documentation. After creating each new vector store, the old vector store gets deleted. Due to the creation of vector stores, some NL2ML calls may take longer to complete.
As of MySQL 9.6.0, the
skip_generate option allows you to
use in-database LLMs instead of foundational models
supported by OCI Generative AI. See
Use NL2ML with In-Database LLMs.
mysql> CALL sys.NL2ML (query, response);
NL2ML parameters:
query: Enter a question in natural
language related to MySQL HeatWave AutoML. For example, "What are the
different types of machine learning models I can create?".
response: The name of the JSON object
session variable that contains the response to the
question.
The @nl2ml_options session variable allows
you to do the following:
Set the skip_generate option to
true or false. The
default value is false. You must set
this to true to bypass OCI Generative
AI and only provide citations. This also allows you to
leverage the citations as context to generate model
training guidance and SQL queries with in-database LLMs.
To learn more, see
Use NL2ML with In-Database LLMs.
View the chat history of the last 20 questions and current OCI Generative AI model:
mysql> SELECT JSON_PRETTY(@nl2ml_options);Reset the chat history. Use one of the following commands:
mysql>SET @nl2ml_options = NULL;mysql>SET @nl2ml_options = JSON_REMOVE(@nl2ml_options, '$.chat_history');
Change the current OCI Generative AI model. To review
available options for model_id, see
Pretrained
Foundational Models in Generative AI.
Change the current model and reset the chat history:
mysql> SET @nl2ml_options = JSON_OBJECT("model_id", "meta.llama-3.1-405b-instruct");
Change the current model and maintain the current chat history:
mysql> SET @nl2ml_options = JSON_SET(@nl2ml_options, '$.model_id', "meta.llama-3.1-405b-instruct");
Ask a question about using MySQL HeatWave AutoML:
mysql>CALL sys.NL2ML("Is there any limit on the size of training table?", @output);Query OK, 0 rows affected (11.0086 sec) mysql>SELECT @output;+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | @output | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | {"text": "Yes, there are limits on the size of the training table for HeatWave AutoML models. The table used to train a model cannot exceed 10 GB in size, 100 million rows, or 1017 columns."} | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.0003 sec)
Ask a question, and then ask a follow-up question based on the first one:
mysql>CALL sys.NL2ML("What is the purpose of task in ML_TRAIN?", @output);Query OK, 0 rows affected (32.4279 sec) mysql>SELECT @output;+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | @output | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | {"text": "The purpose of 'task' in ML_TRAIN is to specify the machine learning task, such as classification, regression, forecasting, anomaly detection, recommendation, or topic modeling."} | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.0003 sec) mysql>CALL sys.NL2ML("Can you give me an example of using that?", @output);Query OK, 0 rows affected (11.7221 sec) mysql>SELECT @output;+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | @output | +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ | {"text": "Here's an example of specifying a machine learning task in ML_TRAIN: \n\n```sql\nCALL sys.ML_TRAIN('mlcorpus.emails', 'spam_label', JSON_OBJECT('task', 'classification'), @model); \n```\n\nIn this example, the machine learning task is specified as 'classification' and the target column is 'spam_label' in the 'emails' table."} | +--------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.0003 sec)
Ask for an appropriate table in the DB System to use for a training query:
mysql>CALL sys.NL2ML("Can you give me a training query for email spam detection?", @output);Query OK, 0 rows affected (8.5084 sec) mysql>SELECT @output;+-------------------------------------------------------------------------------------------------------------------------------------------+ | @output | +-------------------------------------------------------------------------------------------------------------------------------------------+ | {"text": "```sql\nCALL sys.ML_TRAIN('mlcorpus.emails', 'spam_label', \n JSON_OBJECT('task', 'classification'), @email_spam_model);\n```"} | +-------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.0003 sec)
Request debugging assistance for an error message:
mysql>CALL sys.NL2ML("When I tried to train a model I got this error: can not train a regression or recommendation task with non-numeric targte column What does this mean? How can I resolve the issue?", @output);Query OK, 0 rows affected (18.0766 sec) mysql>SELECT @output;+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | @output | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | {"text": "The error \"can not train a regression or recommendation task with non-numeric target column\" means that you are trying to train a regression or recommendation model on a dataset where the target column (the column you are trying to predict) is not numeric. \n\nRegression models require a numeric target column, as they are designed to predict continuous values. Recommendation models also require numeric target columns, as they are designed to predict ratings or scores. \n\nTo resolve this issue, you need to either:\n\n1. Choose a different machine learning task that can handle non-numeric target columns, such as classification.\n2. Convert your target column to a numeric data type, if possible. For example, if your target column is a categorical variable with a limited number of categories, you can use one-hot encoding or label encoding to convert it to a numeric representation.\n3. Select a different column as the target column that is numeric. \n\nIn your case, since you are using the \"food_delivery\" table, you may want to check the data type of the column you are trying to predict and see if it can be converted to a numeric data type. If not, you may want to choose a different machine learning task or select a different column as the target column."} | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.0003 sec)
Retrieve the chat history of questions and responses,
along with the current OCI Generative AI model, with the
@nl2ml_options session variable:
mysql> SELECT JSON_PRETTY(@nl2ml_options);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| JSON_PRETTY(@nl2ml_options) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
"model_id": "meta.llama-3.1-405b-instruct",
"chat_history": [
{
"user_message": "Is there any limit on the size of training table?",
"chat_bot_message": "Yes, there is a size limit for training tables in HeatWave. The table used to train a model cannot exceed 10 GB, 100 million rows, or 1017 columns."
}
]
} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.0003 sec)
Use the skip_generate option to bypass
OCI Generative AI content generation and only provide
citations to relevant documentation. Then, retrieve a
relevant table schema related to the question
(ML_RETRIEVE_SCHEMA_METADATA), build a
compact context string from the citations
(GROUP_CONCAT), and generate a response
that includes the citations, context, and retrieved table
schema (ML_GENERATE).
mysql>SET @input = "How can I train a model to predict net worth of a singer?";mysql>SET @nl2ml_options = JSON_OBJECT("skip_generate", true);mysql>CALL sys.NL2ML(@input, @out);mysql>SELECT JSON_PRETTY(@out);JSON_PRETTY(@out) { "citations": [ { "segment": "<segment content>", "distance": 0.1023, "document_name": "https://.../heatwave-en-ml-9.6.0.pdf", "segment_number": <segment number> }, ... ], "vector_store": [ "`ML_SCHEMA_userOne`.`hwml_vector_store_heatwave_en_ml_9_6_0`" ], "retrieval_info": { "method": "n_citations", "threshold": 0.114 } } mysql>CALL sys.ML_RETRIEVE_SCHEMA_METADATA(@input, @retrieved, NULL);mysql>SELECT @retrieved;@retrieved CREATE TABLE `mlcorpus`.`singer`( `Singer_ID` int, `Name` varchar, `Birth_Year` double, `Net_Worth_Millions` double, `Citizenship` varchar ); mysql>SELECT GROUP_CONCAT(seg SEPARATOR '\n') INTO @ctx FROM JSON_TABLE(JSON_EXTRACT(@out,'$.citations'), '$[*]' COLUMNS (seg LONGTEXT PATH '$.segment')) AS jt;mysql>SET @final_ctx = CONCAT(@ctx, '\n\nRetrieved tables:\n', @retrieved);mysql>SELECT sys.ML_GENERATE( @input, JSON_OBJECT( "task", "generation", "model_id", "llama3.2-3b-instruct-v1", "context", @final_ctx ) ) INTO @result;mysql>SELECT JSON_UNQUOTE(JSON_EXTRACT(@result,'$.text')) AS generated_sql;generated_sql To train a model to predict the net worth of a singer, you can use the ML_TRAIN routine. First, prepare your dataset, which in this case seems to be the 'singer' table in the 'mlcorpus' schema. Ensure that the table has the necessary columns, such as 'Singer_ID', 'Name', 'Birth_Year', 'Net_Worth_Millions', and 'Citizenship'. The 'Net_Worth_Millions' column will be your target column for prediction. You may need to preprocess your data, for example, converting categorical variables like 'Name' and 'Citizenship' into numerical variables if necessary. Then, you can call the ML_TRAIN routine with the appropriate options. For a regression task like predicting net worth, you would specify the task as 'regression' in the JSON options. Here's a simplified example: ```sql CALL sys.ML_TRAIN('mlcorpus.singer', @model_handle, 'Net_Worth_Millions', JSON_OBJECT('task', 'regression', 'algorithm', 'XGBRegressor')); ``` Replace '@model_handle' with your actual model handle variable. This will train a model to predict the 'Net_Worth_Millions' based on the other columns in your 'singer' table. After training, you can use the ML_PREDICT_ROW or ML_PREDICT_TABLE routine to generate predictions for new, unseen data.