9.7 Attribute Importance
The oml.ai
class computes the relative attribute importance, which ranks attributes according to their significance in predicting a classification or regression target.
The oml.ai
class uses the Minimum Description Length (MDL) algorithm to calculate attribute importance. MDL assumes that the simplest, most compact representation of the data is the best and most probable explanation of the data.
You can use methods of the oml.ai
class to compute the relative importance of predictor variables when predicting a response variable.
Note:
Oracle Machine Learning does not support the scoring operation foroml.ai
.
The results of oml.ai
are the attributes of the build data ranked according to their predictive influence on a specified target attribute. You can use the ranking and the measure of importance for selecting attributes.
For information on the oml.ai
class attributes and methods, invoke help(oml.ai)
or see Oracle Machine Learning for Python API Reference.
See Also:
Example 9-7 Ranking Attribute Significance with oml.ai
This example creates the x
and y
variables using the iris data set. It then creates the persistent database table IRIS and the oml.DataFrame
object oml_iris
as a proxy for the table.
This example demonstrates the use of various methods of the oml.ai
class.
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
# Create training and test data.
dat = oml.sync(table = 'IRIS').split()
train_x = dat[0].drop('Species')
train _y = dat[0]['Species']
test_dat = dat[1]
# Specify settings.
setting = {'ODMS_SAMPLING':'ODMS_SAMPLING_DISABLE'}
# Create an AI model object.
ai_mod = oml.ai(**setting)
# Fit the AI model according to the training data and parameter
# settings.
ai_mod = ai_mod.fit(train_x, train_y)
# Show the model details.
ai_mod
Listing for This Example
>>> import oml
>>> import pandas as pd
>>> from sklearn import datasets
>>>
>>> # Load the iris data set and create a pandas.DataFrame for it.
... iris = datasets.load_iris()
>>> x = pd.DataFrame(iris.data,
... columns = ['Sepal_Length','Sepal_Width',
... 'Petal_Length','Petal_Width'])
>>> y = pd.DataFrame(list(map(lambda x:
... {0: 'setosa', 1: 'versicolor',
... 2:'virginica'}[x], iris.target)),
... columns = ['Species'])
>>>
>>> try:
... oml.drop('IRIS')
... except:
... pass
>>>
>>> # Create the IRIS database table and the proxy object for the table.
... oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
>>>
>>> # Create training and test data.
... dat = oml.sync(table = 'IRIS').split()
>>> train_x = dat[0].drop('Species')
>>> train_y = dat[0]['Species']
>>> test_dat = dat[1]
>>>
>>> # Specify settings.
... setting = {'ODMS_SAMPLING':'ODMS_SAMPLING_DISABLE'}
>>>
>>> # Create an AI model object.
... ai_mod = oml.ai(**setting)
>>>
>>> # Fit the AI model according to the training data and parameter
... # settings.
>>> ai_mod = ai_mod.fit(train_x, train_y)
>>>
>>> # Show the model details.
... ai_mod
Algorithm Name: Attribute Importance
Mining Function: ATTRIBUTE_IMPORTANCE
Settings:
setting name setting value
0 ALGO_NAME ALGO_AI_MDL
1 ODMS_DETAILS ODMS_ENABLE
2 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
3 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
4 PREP_AUTO ON
Global Statistics:
attribute name attribute value
0 NUM_ROWS 104
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Partition: NO
Importance:
variable importance rank
0 Petal_Width 0.615851 1
1 Petal_Length 0.362519 2
2 Sepal_Length 0.042751 3
3 Sepal_Width -0.155867 4