9.20 Non-Negative Matrix Factorization
The oml.nmf
class creates a Non-Negative Matrix Factorization (NMF) model for feature extraction.
Each feature extracted by NMF is a linear combination of the original attribution set. Each feature has a set of non-negative coefficients, which are a measure of the weight of each attribute on the feature. If the argument allow.negative.scores
is TRUE
, then negative coefficients are allowed.
Settings for a Non-Negative Matrix Factorization Models
The following table lists settings that apply to Non-Negative Matrix Factorization models.
Table 9-17 Non-Negative Matrix Factorization Model Settings
Setting Name | Setting Value | Description |
---|---|---|
|
|
Convergence tolerance for NMF algorithm Default is |
|
|
Whether negative numbers should be allowed in scoring results. When set to Default is |
|
|
Number of iterations for NMF algorithm Default is |
|
|
Random seed for NMF algorithm. Default is |
Example 9-19 Using the oml.nmf Class
This example creates an NMF model and uses some of the methods of the oml.nmf
class.
import oml
import pandas as pd from sklearn import datasets
#For on-premises database follow the below command to connect to the database
oml.connect("<username>","<password>",dsn="dsn")
iris = datasets.load_iris()
x = pd.DataFrame(iris.data, columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'])
x.insert(0, "ID", range(1, len(x) + 1))
y = pd.DataFrame(list(map(lambda x: {0: 'setosa', 1: 'versicolor', 2:'virginica'}[x], iris.target)), columns = ['Species'])
z = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
#Create training and test data sets.
train_dat, test_dat = oml.sync(table = "IRIS").split()
#Create a Non-Negative Matrix Factorization model using oml.nmf.
nmf_mod = oml.nmf()
#Fit the model to the training data.
nmf_mod = nmf_mod.fit(train_dat)
#Show the model details.
nmf_mod
#Use the model to make predictions on the test data, returning the Sepal_Length, Sepal_Width, Petal_Length, and Species columns in the result.
nmf_mod.predict(test_dat, supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Species']])
nmf_mod.transform(test_dat, supplemental_cols = test_dat[:, ['Sepal_Length']], topN = 2).sort_values(by = ['Sepal_Length', 'TOP_1', 'TOP_1_VAL'])
#Feature comparison
nmf_mod.feature_compare(test_dat, compare_cols = ["Sepal_Length", "Petal_Length"], supplemental_cols = ["Species"])
#Set new parameters and refit the model to produce U matrix output.
new_setting = {'nmfs_conv_tolerance':0.05}
nmf_mod2 = nmf_mod.set_params(**new_setting).fit(train_dat, case_id = "ID")
nmf_mod2
Listing for This Example
>>> import oml
>>> import pandas as pd
>>> from sklearn import datasets
>>> #For on-premises database follow the below command to connect to the database
>>> oml.connect("<username>","<password>", dsn="<dsn>")
>>> iris = datasets.load_iris()
>>> x = pd.DataFrame(iris.data, columns = ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Petal_Width'])
>>> x.insert(0, "ID", range(1, len(x) + 1))
>>> y = pd.DataFrame(list(map(lambda x: {0: 'setosa', 1: 'versicolor', 2:'virginica'}[x], iris.target)), columns = ['Species'])
>>> z = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
#Create training and test data sets.
>>> dat = oml.sync(table = "IRIS").split()
>>> train_dat = dat[0]
>>> test_dat = dat[1]
#Create a Non-Negative Matrix Factorization model using oml.nmf.
>>> nmf_mod = oml.nmf()
#Fit the model to the training data.
>>> nmf_mod = nmf_mod.fit(train_dat)
#Show the model details.
>>> nmf_mod
Algorithm Name: Non-Negative Matrix Factorizationx
Mining Function: FEATURE_EXTRACTION
Settings:
setting name setting value
0 ALGO_NAME ALGO_NONNEGATIVE_MATRIX_FACTOR
1 NMFS_CONV_TOLERANCE .05
2 NMFS_NONNEGATIVE_SCORING NMFS_NONNEG_SCORING_ENABLE
3 NMFS_NUM_ITERATIONS 50
4 NMFS_RANDOM_SEED -1
5 ODMS_DETAILS ODMS_ENABLE
6 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
7 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
8 PREP_AUTO ON
Computed Settings:
setting name setting value
0 FEAT_NUM_FEATURES 2
1 NMFS_NUM_ITERATIONS 2
2 ODMS_EXPLOSION_MIN_SUPP 1
Global Statistics:
attribute name attribute value
0 CONVERGED YES
1 CONV_ERROR 0.0444448
2 ITERATIONS 2
3 NUM_ROWS 111
4 SAMPLE_SIZE 111
Attributes:
ID
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species
Partition: NO
H:
FEATURE_ID FEATURE_NAME ATTRIBUTE_NAME ATTRIBUTE_VALUE COEFFICIENT
0 1 1 ID None 0.581551
1 1 1 Petal_Length None 0.355323
2 1 1 Petal_Width None 0.158492
3 1 1 Sepal_Length None 0.656558
4 1 1 Sepal_Width None 0.424101
5 1 1 Species setosa 0.089560
6 1 1 Species versicolor 0.534806
7 1 1 Species virginica 0.539590
8 2 2 ID None 0.344647
9 2 2 Petal_Length None 0.506623
10 2 2 Petal_Width None 0.650077
11 2 2 Sepal_Length None 0.170237
12 2 2 Sepal_Width None 0.248640
13 2 2 Species setosa 0.249221
14 2 2 Species versicolor 0.042316
15 2 2 Species virginica 0.093861
W:
FEATURE_ID FEATURE_NAME ATTRIBUTE_NAME ATTRIBUTE_VALUE COEFFICIENT
0 1 1 ID None 0.288559
1 1 1 Petal_Length None -0.062579
2 1 1 Petal_Width None -0.370128
3 1 1 Sepal_Length None 0.502382
4 1 1 Sepal_Width None 0.212611
5 1 1 Species versicolor 0.486970
6 1 1 Species setosa -0.113835
7 1 1 Species virginica 0.450038
8 2 2 ID None 0.119462
9 2 2 Petal_Length None 0.578697
10 2 2 Petal_Width None 0.982575
11 2 2 Sepal_Length None -0.238993
12 2 2 Sepal_Width None 0.082511
13 2 2 Species setosa 0.353453
14 2 2 Species versicolor -0.359264
15 2 2 Species virginica -0.275074
#Use the model to make predictions on the test data, returning the Sepal_Length, Sepal_Width, Petal_Length, and Species columns in the result.
>>> nmf_mod.predict(test_dat, supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width', 'Petal_Length', 'Species']])
Sepal_Length Sepal_Width Petal_Length Species FEATURE_ID
0 5.0 3.6 1.4 setosa 2
1 5.0 3.4 1.5 setosa 2
2 4.4 2.9 1.4 setosa 2
3 4.9 3.1 1.5 setosa 2
... ... ... ... ... ...
35 6.9 3.1 5.4 virginica 2
36 5.8 2.7 5.1 virginica 2
37 6.2 3.4 5.4 virginica 2
38 5.9 3.0 5.1 virginica 2
#Transform
>>> nmf_mod.transform(test_dat, supplemental_cols = test_dat[:, ['Sepal_Length']], topN = 2).sort_values(by = ['Sepal_Length', 'TOP_1', 'TOP_1_VAL'])
Sepal_Length TOP_1 TOP_1_VAL TOP_2 TOP_2_VAL
0 4.4 2 0.464041 1 0.000000
1 4.4 2 0.482051 1 0.045518
2 4.8 2 0.475169 1 0.083874
3 4.8 2 0.510372 1 0.101880
... ... ... ... ... ...
35 7.2 1 0.915012 2 0.850330
36 7.2 1 0.938112 2 0.745207
37 7.6 2 0.980757 1 0.864508
38 7.9 1 1.048287 2 0.947744
#Feature comparison
>>> nmf_mod.feature_compare(test_dat, compare_cols = ["Sepal_Length", "Petal_Length"], supplemental_cols = ["Species"])
Species_A Species_B SIMILARITY
0 setosa setosa 0.990134
1 setosa setosa 0.929516
2 setosa setosa 0.976885
3 setosa setosa 0.953770
... ... ... ...
737 virginica virginica 0.849758
738 virginica virginica 0.944063
739 virginica virginica 0.983637
740 virginica virginica 0.958018
[741 rows x 3 columns]
#Set new parameters and refit tthe model to produce U matrix output.
>>> new_setting = {'nmfs_conv_tolerance':0.05}
>>> nmf_mod2 = nmf_mod.set_params(**new_setting).fit(train_dat, case_id = "ID")
>>> nmf_mod2
Algorithm Name: Non-Negative Matrix Factorizationx
Mining Function: FEATURE_EXTRACTION
Settings:
setting name setting value
0 ALGO_NAME ALGO_NONNEGATIVE_MATRIX_FACTOR
1 NMFS_CONV_TOLERANCE 0.05
2 NMFS_NONNEGATIVE_SCORING NMFS_NONNEG_SCORING_ENABLE
3 NMFS_NUM_ITERATIONS 50
4 NMFS_RANDOM_SEED -1
5 ODMS_DETAILS ODMS_ENABLE
6 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
7 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
8 PREP_AUTO ON
Computed Settings:
setting name setting value
0 FEAT_NUM_FEATURES 2
1 NMFS_NUM_ITERATIONS 8
2 ODMS_EXPLOSION_MIN_SUPP 1
Global Statistics:
attribute name attribute value
0 CONVERGED YES
1 CONV_ERROR 0.0277253
2 ITERATIONS 8
3 NUM_ROWS 111
4 SAMPLE_SIZE 111
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species
Partition: NO
H:
FEATURE_ID FEATURE_NAME ATTRIBUTE_NAME ATTRIBUTE_VALUE COEFFICIENT
0 1 1 Petal_Length None 9.889792e-02
1 1 1 Petal_Width None 1.060984e-01
2 1 1 Sepal_Length None 1.947197e-01
3 1 1 Sepal_Width None 5.099539e-01
4 1 1 Species setosa 7.507257e-01
5 1 1 Species versicolor 5.773815e-03
6 1 1 Species virginica 8.136382e-02
7 2 2 Petal_Length None 6.652922e-01
8 2 2 Petal_Width None 6.571416e-01
9 2 2 Sepal_Length None 5.702848e-01
10 2 2 Sepal_Width None 2.420062e-01
11 2 2 Species setosa 1.643131e-08
12 2 2 Species versicolor 5.158020e-01
13 2 2 Species virginica 4.948837e-01
W:
FEATURE_ID FEATURE_NAME ATTRIBUTE_NAME ATTRIBUTE_VALUE COEFFICIENT
0 1 1 Petal_Length None -0.071259
1 1 1 Petal_Width None -0.059774
2 1 1 Sepal_Length None 0.077608
3 1 1 Sepal_Width None 0.571981
4 1 1 Species versicolor -0.144686
5 1 1 Species setosa 0.947005
6 1 1 Species virginica -0.043170
7 2 2 Petal_Length None 0.392684
8 2 2 Petal_Width None 0.385395
9 2 2 Sepal_Length None 0.304214
10 2 2 Sepal_Width None 0.003195
11 2 2 Species setosa -0.221185
12 2 2 Species versicolor 0.325338
13 2 2 Species virginica 0.289804