10.4 Model Tuning
The oml.automl.ModelTuning
class tunes the hyperparameters for the specified classification or regression algorithm and training data.
Model tuning is a laborious machine learning task that relies heavily on data scientist expertise. With limited user input, the oml.automl.ModelTuning
class automates this process using a highly-parallel, asynchronous gradient-based hyperparameter optimization algorithm to tune the hyperparameters of an Oracle Machine Learning algorithm.
The oml.automl.ModelTuning
class supports classification and regression algorithms. To use the oml.automl.ModelTuning
class, you specify a data set and an algorithm to obtain a tuned model and its corresponding hyperparameters. An advanced user can provide a customized hyperparameter search space and a non-default scoring metric to this black box optimizer.
For a partitioned model, if you pass in the column to partition on in the param_space
argument of the tune
method, oml.automl.ModelTuning
tunes the partitioned model’s hyperparameters.
For information on the parameters and methods of the class, invoke
help(oml.automl.ModelTuning)
or see Oracle Machine Learning for Python API Reference.
Example 10-3 Using the oml.automl.ModelTuning
Class
This example creates an oml.automl.ModelTuning
object.
import oml
from oml import automl
import pandas as pd
from sklearn import datasets
# Load the breast cancer data set.
bc = datasets.load_breast_cancer()
bc_data = bc.data.astype(float)
X = pd.DataFrame(bc_data, columns = bc.feature_names)
y = pd.DataFrame(bc.target, columns = ['TARGET'])
# Create the database table BreastCancer.
oml_df = oml.create(pd.concat([X, y], axis=1),
table = 'BreastCancer')
# Split the data set into training and test data.
train, test = oml_df.split(ratio=(0.8, 0.2), seed = 1234)
X, y = train.drop('TARGET'), train['TARGET']
X_test, y_test = test.drop('TARGET'), test['TARGET']
# Start an automated model tuning run with a Decision Tree model.
at = automl.ModelTuning(mining_function='classification', score_metric='accuracy',
parallel=4)
results = at.tune('dt', X, y)
# Show the tuned model details.
tuned_model = results['best_model']
tuned_model
# Show the best tuned model train score and the
# corresponding hyperparameters.
score, params = results['all_evals'][0]
"{:.2}".format(score), ["{}:{}".format(k, params[k])
for k in sorted(params)]
# Use the tuned model to get the score on the test set.
"{:.2}".format(tuned_model.score(X_test, y_test))
# An example invocation of model tuning with user-defined
# search ranges for selected hyperparameters on a new tuning
# metric (f1_macro).
search_space = {
'RFOR_SAMPLING_RATIO': {'type': 'continuous',
'range': [0.01, 0.5]},
'RFOR_NUM_TREES': {'type': 'discrete',
'range': [50, 100]},
'TREE_IMPURITY_METRIC': {'type': 'categorical',
'range': ['TREE_IMPURITY_ENTROPY',
'TREE_IMPURITY_GINI']},}
results = at.tune('rf', X, y, param_space=search_space)
score, params = results['all_evals'][0]
("{:.2}".format(score), ["{}:{}".format(k, params[k])
for k in sorted(params)])
# Some hyperparameter search ranges need to be defined based on the
# training data set sizes (for example, the number of samples and
# features). You can use placeholders specific to the data set,
# such as $nr_features and $nr_samples, as the search ranges.
search_space = {'RFOR_MTRY': {'type': 'discrete',
'range': [1, '$nr_features/2']}}
results = at.tune('rf', X, y, param_space=search_space)
score, params = results['all_evals'][0]
("{:.2}".format(score), ["{}:{}".format(k, params[k])
for k in sorted(params)])
# Drop the database table.
oml.drop('BreastCancer')
Listing for This Example
>>> import oml
>>> from oml import automl
>>> import pandas as pd
>>> from sklearn import datasets
>>>
>>> # Load the breast cancer data set.
... bc = datasets.load_breast_cancer()
>>> bc_data = bc.data.astype(float)
>>> X = pd.DataFrame(bc_data, columns = bc.feature_names)
>>> y = pd.DataFrame(bc.target, columns = ['TARGET'])
>>>
>>> # Create the database table BreastCancer.
>>> oml_df = oml.create(pd.concat([X, y], axis=1),
... table = 'BreastCancer')
>>>
>>> # Split the data set into training and test data.
... train, test = oml_df.split(ratio=(0.8, 0.2), seed = 1234)
>>> X, y = train.drop('TARGET'), train['TARGET']
>>> X_test, y_test = test.drop('TARGET'), test['TARGET']
>>>
>>> # Start an automated model tuning run with a Decision Tree model.
... at = automl.ModelTuning(mining_function='classification', score_metric='accuracy',
... parallel=4)
>>> results = at.tune('dt', X, y)
>>>
>>> # Show the tuned model details.
... tuned_model = results['best_model']
>>> tuned_model
Algorithm Name: Decision Tree
Mining Function: CLASSIFICATION
Target: TARGET
Settings:
setting name setting value
0 ALGO_NAME ALGO_DECISION_TREE
1 CLAS_MAX_SUP_BINS 32
2 CLAS_WEIGHTS_BALANCED OFF
3 ODMS_DETAILS ODMS_DISABLE
4 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
5 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
6 PREP_AUTO ON
7 TREE_IMPURITY_METRIC TREE_IMPURITY_GINI
8 TREE_TERM_MAX_DEPTH 8
9 TREE_TERM_MINPCT_NODE 3.34
10 TREE_TERM_MINPCT_SPLIT 0.1
11 TREE_TERM_MINREC_NODE 10
12 TREE_TERM_MINREC_SPLIT 20
Attributes:
mean radius
mean texture
mean perimeter
mean area
mean smoothness
mean compactness
mean concavity
mean concave points
mean symmetry
mean fractal dimension
radius error
texture error
perimeter error
area error
smoothness error
compactness error
concavity error
concave points error
symmetry error
fractal dimension error
worst radius
worst texture
worst perimeter
worst area
worst smoothness
worst compactness
worst concavity
worst concave points
worst symmetry
worst fractal dimension
Partition: NO
>>>
>>> # Show the best tuned model train score and the
... # corresponding hyperparameters.
... score, params = results['all_evals'][0]
>>> "{:.2}".format(score), ["{}:{}".format(k, params[k])
... for k in sorted(params)]
('0.92', ['CLAS_MAX_SUP_BINS:32', 'TREE_IMPURITY_METRIC:TREE_IMPURITY_GINI', 'TREE_TERM_MAX_DEPTH:7', 'TREE_TERM_MINPCT_NODE:0.05', 'TREE_TERM_MINPCT_SPLIT:0.1'])
>>>
>>> # Use the tuned model to get the score on the test set.
... "{:.2}".format(tuned_model.score(X_test, y_test))
'0.92
>>>
>>> # An example invocation of model tuning with user-defined
... # search ranges for selected hyperparameters on a new tuning
... # metric (f1_macro).
... search_space = {
... 'RFOR_SAMPLING_RATIO': {'type': 'continuous',
... 'range': [0.01, 0.5]},
... 'RFOR_NUM_TREES': {'type': 'discrete',
... 'range': [50, 100]},
... 'TREE_IMPURITY_METRIC': {'type': 'categorical',
... 'range': ['TREE_IMPURITY_ENTROPY',
... 'TREE_IMPURITY_GINI']},}
>>> results = at.tune('rf', X, y, param_space=search_space)
>>> score, params = results['all_evals'][0]
>>> ("{:.2}".format(score), ["{}:{}".format(k, params[k])
... for k in sorted(params)])
('0.92', ['RFOR_NUM_TREES:53', 'RFOR_SAMPLING_RATIO:0.4999951', 'TREE_IMPURITY_METRIC:TREE_IMPURITY_ENTROPY'])
>>>
>>> # Some hyperparameter search ranges need to be defined based on the
... # training data set sizes (for example, the number of samples and
... # features). You can use placeholders specific to the data set,
... # such as $nr_features and $nr_samples, as the search ranges.
... search_space = {'RFOR_MTRY': {'type': 'discrete',
... 'range': [1, '$nr_features/2']}}
>>> results = at.tune('rf', X, y, param_space=search_space)
>>> score, params = results['all_evals'][0]
>>> ("{:.2}".format(score), ["{}:{}".format(k, params[k])
... for k in sorted(params)])
('0.93', ['RFOR_MTRY:10'])
>>>
>>> # Drop the database table.
... oml.drop('BreastCancer')
Parent topic: Automated Machine Learning