MySQL HeatWave User Guide
ML_SCORE scores a model by
generating predictions using the feature columns in a labeled
dataset as input and comparing the predictions to ground truth
values in the target column of the labeled dataset. The dataset
used with ML_SCORE should have
the same feature columns as the dataset used to train the model
but the data should be different. For example, you might reserve
20 to 30 percent of the labeled training data for scoring.
ML_SCORE returns a computed
metric indicating the quality of the model.
ML_SCORE does not support the
topic modeling model type.
mysql>CALL sys.ML_SCORE(table_name,target_column_name,model_handle,metric,score, [options]);options: { JSON_OBJECT("key","value"[,"key","value"] ...)"key","value": { ['threshold', 'N'] ['topk', 'N'] ['remove_seen', {'true'|'false'}] ['item_metadata', JSON_OBJECT('table_name'[,'database_name.table_name'] ...)] ['user_metadata', JSON_OBJECT('table_name'[,'database_name.table_name'] ...)] } }
Set the following required parameters.
table_name: Specifies the fully
qualified name of the table used to compute model quality
(database_name.table_name). The
table must contain the same columns as the training
dataset.
target_column_name: If scoring a
supervised or semi-supervised model, specify the name of
the target column containing ground truth values. If
scoring an unsupervised model, set to
NULL. See
MySQL HeatWave AutoML
Learning Types.
model_handle: Specifies the model
handle or a session variable containing the model handle.
See Work with
Model Handles.
metric: Specifies the name of the
metric. The metric selected must be compatible with the
task type used for training the model.
See Optimization and
Scoring Metrics.
score: Specifies the user-defined
variable name for the computed score. The
ML_SCORE routine populates
the variable. User variables are written as
@.
Any valid name for a user-defined variable is permitted.
var_name
The following options in JSON format are
available for recommendation and anomaly detection models.
Set the following options as needed for recommendation models.
threshold: The optional threshold that
defines positive feedback, and a relevant sample. Only use
with ranking metrics. It can be used for either explicit
or implicit feedback.
topk: The optional top number of
recommendations to provide. The default is
3. Set a positive integer between 1 and
the number of rows in the table.
A recommendation task and ranking
metrics can use both threshold and
topk.
remove_seen: If the input table
overlaps with the training table, and
remove_seen is true,
then the model will not repeat existing interactions. The
default is true. Set
remove_seen to false
to repeat existing interactions from the training table.
item_metadata: As of MySQL 9.5.0, it
defines the table that has item descriptions. It is a JSON
object that has the table_name option
as a key, which specifies the table that has item
descriptions. One column must be the same as the
item_id in the input table.
user_metadata: As of MySQL 9.5.0, it
defines the table that has user descriptions. It is a JSON
object that has the table_name option
as a key, which specifies the table that has user
descriptions. One column must be the same as the
user_id in the input table.
table_name: To be used with the
item_metadata and
user_metadata options. It specifies
the table name that has item or user descriptions. It
must be a string in a fully qualified format
(schema_name.table_name) that specifies the table
name.
As of MySQL 9.4.1, if you run
ML_PREDICT_TABLE with the
log_anomaly_detection task, at least one
column must act as the primary key to establish the temporal
order of logs.
Set the following options as needed for anomaly detection models.
threshold: The threshold you set on
anomaly detection models determines which rows in the
output table are labeled as anomalies with an anomaly
score of 1, or normal with an anomaly
score of 0. The value for the threshold
is the degree to which a row of data or log segment is
considered for anomaly detection. Any sample with an
anomaly score above the threshold is classified an
anomaly. The default value is (1 -
contamination)-th percentile of all the
anomaly scores.
topk: The optional top K rows to
display with the highest anomaly scores. Set a positive
integer between 1 and the number of rows in the table. If
topk is not set,
ML_SCORE uses
threshold.
Do not set both threshold and
topk. Use threshold
or topk, or set
options to
NULL.
The following example runs generates a score by using the
balanced_accuracy metric. Query the
score with the session variable for the
ML_SCORE routine.
mysql>CALL sys.ML_SCORE('census_data.census_train', 'revenue', 'census_data.census_train_admin_1745439945171', 'balanced_accuracy', @score, NULL);Query OK, 0 rows affected (3.0536 sec) mysql>SELECT @score;+--------------------+ | @score | +--------------------+ | 0.8151071071624756 | +--------------------+ 1 row in set (0.0411 sec)
The following example uses the accuracy
metric with a threshold set to 90%.
mysql>CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'accuracy', @score, JSON_OBJECT('threshold', 0.9));Query OK, 0 rows affected (1.86 sec) mysql>SELECT @score;+--------------------+ | @score | +--------------------+ | 0.9791129231452942 | +--------------------+ 1 row in set (0.00 sec)
The following example uses the
precision_at_k metric with a
topk value of 10.
mysql>CALL sys.ML_SCORE('mlcorpus_anomaly_detection.volcanoes-b3_anomaly_train', 'target', @anomaly, 'precision_at_k', @score, JSON_OBJECT('topk', 10));Query OK, 0 rows affected (5.84 sec) mysql>SELECT @score;+---------------------+ | @score | +---------------------+ | 0.30000001192092896 | +---------------------+ 1 row in set (0.0443 sec)
The following example overrides the
ensemble_score value from the
ML_TRAIN routine to a new
value of 0.5.
mysql>CALL sys.ML_SCORE('mlcorpus.anomaly_train_with_target', "target", @semsup_gknn, 'precision_at_k', @semsup_score_gknn_weighted, CAST('{"topk": 10, "experimental": {"semisupervised": {"supervised_submodel_weight": 0.5}}}' as JSON)); Query OK, 0 rows affected (1.7685 sec)mysql>SELECT @score;+---------------------+ | @score | +---------------------+ | 0.98844456000887996 | +---------------------+ 1 row in set (0.0443 sec)