Expectation Maximization

9.10 Expectation Maximization

The oml.em class uses the Expectation Maximization (EM) algorithm to create a clustering model.

EM is a density estimation algorithm that performs probabilistic clustering. In density estimation, the goal is to construct a density function that captures how a given population is distributed. The density estimate is based on observed data that represents a sample of the population.

For information on the oml.em class methods, invoke help(oml.em) or see Oracle Machine Learning for Python API Reference.

Settings for an Expectation Maximization Model

The following table lists settings for data preparation and analysis for EM models.

Table 9-5 Expectation Maximization Settings for Data Preparation and Analysis

Setting Name	Setting Value	Description
`EMCS_ATTRIBUTE_FILTER`	`EMCS_ATTR_FILTER_ENABLE` `EMCS_ATTR_FILTER_DISABLE`	Whether or not to include uncorrelated attributes in the model. When `EMCS_ATTRIBUTE_FILTER` is enabled, uncorrelated attributes are not included. Note: This setting applies only to attributes that are not nested. The default value is system-determined.
`EMCS_MAX_NUM_ATTR_2D`	`TO_CHAR`(`numeric_expr` `>= 1)`	Maximum number of correlated attributes to include in the model. Note: This setting applies only to attributes that are not nested (2D). The default value is `50`.
`EMCS_NUM_DISTRIBUTION`	`EMCS_NUM_DISTR_BERNOULLI` `EMCS_NUM_DISTR_GAUSSIAN` `EMCS_NUM_DISTR_SYSTEM`	The distribution for modeling numeric attributes. Applies to the input table or view as a whole and does not allow per-attribute specifications. The options include Bernoulli, Gaussian, or system-determined distribution. When Bernoulli or Gaussian distribution is chosen, all numeric attributes are modeled using the same type of distribution. When the distribution is system-determined, individual attributes may use different distributions (either Bernoulli or Gaussian), depending on the data. The default value is `EMCS_NUM_DISTR_SYSTEM`.
`EMCS_NUM_EQUIWIDTH_BINS`	`TO_CHAR`( 1 <`numeric_expr` `<= 255)`	Number of equi-width bins that will be used for gathering cluster statistics for numeric columns. The default value is `11`.
`EMCS_NUM_PROJECTIONS`	`TO_CHAR`( `numeric_expr` `>= 1)`	Specifies the number of projections to use for each nested column. If a column has fewer distinct attributes than the specified number of projections, then the data is not projected. The setting applies to all nested columns. The default value is `50`.
`EMCS_NUM_QUANTILE_BINS`	`TO_CHAR`( 1 < `numeric_expr` `<= 255)`	Specifies the number of quantile bins to use for modeling numeric columns with multivalued Bernoulli distributions. The default value is system-determined.
`EMCS_NUM_TOPN_BINS`	`TO_CHAR`( 1 < `numeric_expr` `<= 255)`	Specifies the number of top-N bins to use for modeling categorical columns with multivalued Bernoulli distributions. The default value is system-determined.

The following table lists settings for learning for EM models.

Table 9-6 Expectation Maximization Settings for Learning

Setting Name	Setting Value	Description
`EMCS_CONVERGENCE_CRITERION`	`EMCS_CONV_CRIT_HELDASIDE` `EMCS_CONV_CRIT_BIC`	The convergence criterion for EM. The convergence criterion may be based on a held-aside data set or it may be Bayesian Information Criterion. The default value is system determined.
`EMCS_LOGLIKE_IMPROVEMENT`	`TO_CHAR( 0 <` `numeric_expr` `< 1)`	When the convergence criterion is based on a held-aside data set (`EMCS_CONVERGENCE_CRITERION` = `EMCS_CONV_CRIT_HELDASIDE`), this setting specifies the percentage improvement in the value of the log likelihood function that is required for adding a new component to the model.
`EMCS_MODEL_SEARCH`	`EMCS_MODEL_SEARCH_ENABLE` `EMCS_MODEL_SEARCH_DISABLE`	Enables model search in EM where different model sizes are explored and the best size is selected. The default value is `EMCS_MODEL_SEARCH_DISABLE`.
`EMCS_NUM_COMPONENTS`	`TO_CHAR`( `numeric_expr` `>= 1`)	Maximum number of components in the model. If model search is enabled, the algorithm automatically determines the number of components based on improvements in the likelihood function or based on regularization, up to the specified maximum. The number of components must be greater than or equal to the number of clusters. The default value is 20.
`EMCS_NUM_ITERATIONS`	`TO_CHAR`( `numeric_expr` `>= 1)`	Specifies the maximum number of iterations in the EM algorithm. The default value is `100`.
`EMCS_RANDOM_SEED`	Non-negative integer	Controls the seed of the random generator used in EM. The default value is `0`.
`EMCS_REMOVE_COMPONENTS`	`EMCS_REMOVE_COMPS_ENABLE` `EMCS_REMOVE_COMPS_DISABLE`	Allows the EM algorithm to remove a small component from the solution. The default value is `EMCS_REMOVE_COMPS_ENABLE`.

The following table lists the settings for component clustering for EM models.

Table 9-7 Expectation Maximization Settings for Component Clustering

Setting Name	Setting Value	Description
`CLUS_NUM_CLUSTERS`	`TO_CHAR(numeric_expr` `>= 1)`	The maximum number of leaf clusters generated by the algorithm. The algorithm may return fewer clusters than the specified number, depending on the data. but it cannot return more clusters than the number of components, which is governed by algorithm-specific settings. (See Table 9-6.) Depending on these settings, there may be fewer clusters than components. If component clustering is disabled, then the number of clusters equals the number of components. The default value is system-determined.
`EMCS_CLUSTER_COMPONENTS`	`EMCS_CLUSTER_COMP_ENABLE` `EMCS_CLUSTER_COMP_DISABLE`	Enables or disables the grouping of EM components into high-level clusters. When disabled, the components themselves are treated as clusters. When component clustering is enabled, model scoring through the SQL `CLUSTER` function produces assignments to the higher level clusters. When clustering is disabled, the `CLUSTER` function produces assignments to the original components. The default value is `EMCS_CLUSTER_COMP_ENABLE`.
`EMCS_CLUSTER_THRESH`	`TO_CHAR`(`numeric_expr` `>= 1)`	Dissimilarity threshold that controls the clustering of EM components. When the dissimilarity measure is less than the threshold, the components are combined into a single cluster. A lower threshold may produce more clusters that are more compact. A higher threshold may produce fewer clusters that are more spread out. The default value is `2`.
`EMCS_LINKAGE_FUNCTION`	`EMCS_LINKAGE_SINGLE` `EMCS_LINKAGE_AVERAGE` `EMCS_LINKAGE_COMPLETE`	Allows the specification of a linkage function for the agglomerative clustering step. `EMCS_LINKAGE_SINGLE` uses the nearest distance within the branch. The clusters tend to be larger and have arbitrary shapes. `EMCS_LINKAGE_AVERAGE` uses the average distance within the branch. There is less chaining effect and the clusters are more compact. `EMCS_LINKAGE_COMPLETE` uses the maximum distance within the branch. The clusters are smaller and require strong component overlap. The default value is `EMCS_LINKAGE_SINGLE`.

The following table lists the settings for cluster statistics for EM models.

Table 9-8 Expectation Maximization Settings for Cluster Statistics

Setting Name Setting Value Description

Setting Name	Setting Value	Description
`EMCS_CLUSTER_STATISTICS`	`EMCS_CLUS_STATS_ENABLE` `EMCS_CLUS_STATS_DISABLE`	Enables or disables the gathering of descriptive statistics for clusters (centroids, histograms, and rules). When statistics are disabled, model size is reduced. The default value is `EMCS_CLUS_STATS_ENABLE`.
`EMCS_MIN_PCT_ATTR_SUPPORT`	`TO_CHAR( 0 <` `numeric_expr` `< 1)`	Minimum support required for including an attribute in the cluster rule. The support is the percentage of the data rows assigned to a cluster that must have non-null values for the attribute. The default value is `0.1`.

EMCS_CLUSTER_STATISTICS

EMCS_CLUS_STATS_ENABLE

EMCS_CLUS_STATS_DISABLE

Enables or disables the gathering of descriptive statistics for clusters (centroids, histograms, and rules). When statistics are disabled, model size is reduced.

The default value is EMCS_CLUS_STATS_ENABLE.

EMCS_MIN_PCT_ATTR_SUPPORT

TO_CHAR( 0 < numeric_expr < 1)

Minimum support required for including an attribute in the cluster rule. The support is the percentage of the data rows assigned to a cluster that must have non-null values for the attribute.

The default value is 0.1.

See Also:

Example 9-10 Using the oml.em Class

This example creates an EM model and uses some of the methods of the oml.em class.

import oml
import pandas as pd
from sklearn import datasets 

# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
                 columns = ['Sepal_Length','Sepal_Width',
                            'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
                           {0: 'setosa', 1: 'versicolor',
                            2:'virginica'}[x], iris.target)),
                 columns = ['Species'])

try:
    oml.drop('IRIS')
except: 
    pass

# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')

# Create training and test data.
dat = oml.sync(table = 'IRIS').split()
train_dat = dat[0]
test_dat = dat[1]

# Specify settings.
setting = {'emcs_num_iterations': 100}

# Create an EM model object
em_mod = oml.em(n_clusters = 2, **setting)

# Fit the EM model according to the training data and parameter
# settings.
em_mod = em_mod.fit(train_dat)

# Show details of the model.
em_mod

# Use the model to make predictions on the test data.
em_mod.predict(test_dat)

# Make predictions and return the probability for each class
# on new data.
em_mod.predict_proba(test_dat, 
  supplemental_cols = test_dat[:, 
    ['Sepal_Length', 'Sepal_Width', 
     'Petal_Length']]).sort_values(by = ['Sepal_Length', 
       'Sepal_Width',  'Petal_Length', 
       'PROBABILITY_OF_2', 'PROBABILITY_OF_3'])

# Change the random seed and refit the model.
em_mod.set_params(EMCS_RANDOM_SEED = '5').fit(train_dat)

Listing for This Example

>>> import oml
>>> import pandas as pd
>>> from sklearn import datasets
>>>
>>> # Load the iris data set and create a pandas.DataFrame for it.
... iris = datasets.load_iris()
>>> x = pd.DataFrame(iris.data, 
...                  columns = ['Sepal_Length','Sepal_Width',
...                             'Petal_Length','Petal_Width'])
>>> y = pd.DataFrame(list(map(lambda x: 
...                            {0: 'setosa', 1: 'versicolor', 
...                             2:'virginica'}[x], iris.target)), 
...                  columns = ['Species'])
>>>
>>> try:
...    oml.drop('IRIS')
... except: 
...    pass
>>>
>>> # Create the IRIS database table and the proxy object for the table.
... oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
>>>
>>> # Create training and test data.
... dat = oml.sync(table = 'IRIS').split()
>>> train_dat = dat[0]
>>> test_dat = dat[1]
>>> 
>>> # Specify settings.
... setting = {'emcs_num_iterations': 100}
>>> 
>>> # Create an EM model object.
... em_mod = oml.em(n_clusters = 2, **setting)
>>> 
>>> # Fit the EM model according to the training data and parameter
... # settings.
>>> em_mod = em_mod.fit(train_dat)
>>> 
>>> # Show details of the model.
... em_mod

Algorithm Name: Expectation Maximization

Mining Function: CLUSTERING

Settings: 
                    setting name                  setting value
0                      ALGO_NAME  ALGO_EXPECTATION_MAXIMIZATION
1              CLUS_NUM_CLUSTERS                              2
2        EMCS_CLUSTER_COMPONENTS       EMCS_CLUSTER_COMP_ENABLE
3        EMCS_CLUSTER_STATISTICS         EMCS_CLUS_STATS_ENABLE
4            EMCS_CLUSTER_THRESH                              2
5          EMCS_LINKAGE_FUNCTION            EMCS_LINKAGE_SINGLE
6       EMCS_LOGLIKE_IMPROVEMENT                           .001
7           EMCS_MAX_NUM_ATTR_2D                             50
8      EMCS_MIN_PCT_ATTR_SUPPORT                             .1
9              EMCS_MODEL_SEARCH      EMCS_MODEL_SEARCH_DISABLE
10           EMCS_NUM_COMPONENTS                             20
11         EMCS_NUM_DISTRIBUTION          EMCS_NUM_DISTR_SYSTEM
12       EMCS_NUM_EQUIWIDTH_BINS                             11
13           EMCS_NUM_ITERATIONS                            100
14          EMCS_NUM_PROJECTIONS                             50
15              EMCS_RANDOM_SEED                              0
16        EMCS_REMOVE_COMPONENTS       EMCS_REMOVE_COMPS_ENABLE
17                  ODMS_DETAILS                    ODMS_ENABLE
18  ODMS_MISSING_VALUE_TREATMENT        ODMS_MISSING_VALUE_AUTO
19                 ODMS_SAMPLING          ODMS_SAMPLING_DISABLE
20                     PREP_AUTO                             ON

Computed Settings: 
                 setting name             setting value
0       EMCS_ATTRIBUTE_FILTER  EMCS_ATTR_FILTER_DISABLE
1  EMCS_CONVERGENCE_CRITERION        EMCS_CONV_CRIT_BIC
2      EMCS_NUM_QUANTILE_BINS                         3
3          EMCS_NUM_TOPN_BINS                         3

Global Statistics: 
       attribute name  attribute value
0           CONVERGED              YES
1       LOGLIKELIHOOD         -2.10044
2        NUM_CLUSTERS                2
3      NUM_COMPONENTS                8
4            NUM_ROWS              104
5         RANDOM_SEED                0
6  REMOVED_COMPONENTS               12

Attributes: 
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species

Partition: NO

Clusters: 

   CLUSTER_ID  CLUSTER_NAME  RECORD_COUNT  PARENT  TREE_LEVEL \
0           1             1           104     NaN           1  
1           2             2            68     1.0           2
2           3             3            36     1.0           2
  LEFT_CHILD_ID  RIGHT_CHILD_ID
0           2.0             3.0
1           NaN             NaN
2           NaN             NaN

Taxonomy: 

   PARENT_CLUSTER_ID  CHILD_CLUSTER_ID
0                  1               2.0
1                  1               3.0
2                  2               NaN
3                  3               NaN

Centroids: 

    CLUSTER_ID ATTRIBUTE_NAME      MEAN  MODE_VALUE  VARIANCE
0            1   Petal_Length  3.721154        None  3.234694
1            1    Petal_Width  1.155769        None  0.567539
2            1   Sepal_Length  5.831731        None  0.753255
3            1    Sepal_Width  3.074038        None  0.221358
4            1        Species       NaN      setosa       NaN
5            2   Petal_Length  4.902941        None  0.860588
6            2    Petal_Width  1.635294        None  0.191572
7            2   Sepal_Length  6.266176        None  0.545555
8            2    Sepal_Width  2.854412        None  0.128786
9            2        Species       NaN  versicolor       NaN
10           3   Petal_Length  1.488889        None  0.033016
11           3    Petal_Width  0.250000        None  0.012857
12           3   Sepal_Length  5.011111        None  0.113016
13           3    Sepal_Width  3.488889        None  0.134159
14           3        Species       NaN      setosa       NaN

Leaf Cluster Counts: 

   CLUSTER_ID  CNT
0           2   68
1           3   36

Attribute Importance: 

  ATTRIBUTE_NAME  ATTRIBUTE_IMPORTANCE_VALUE  ATTRIBUTE_RANK
0   Petal_Length                    0.558311               2
1    Petal_Width                    0.556300               3
2   Sepal_Length                    0.469978               4
3    Sepal_Width                    0.196211               5
4        Species                    0.612463               1

Components: 

   COMPONENT_ID  CLUSTER_ID  PRIOR_PROBABILITY
0             1           2           0.115366
1             2           2           0.079158
2             3           3           0.113448
3             4           2           0.148059
4             5           3           0.126979
5             6           2           0.134402
6             7           3           0.105727
7             8           2           0.176860

Cluster Hists: 

     cluster.id            variable  bin.id  lower.bound  upper.bound  \
0             1        Petal_Length       1         1.00         1.59   
1             1        Petal_Length       2         1.59         2.18   
2             1        Petal_Length       3         2.18         2.77   
3             1        Petal_Length       4         2.77         3.36
...          ...                 ...     ...          ...          ...    
137           3         Sepal_Width      11          NaN          NaN   
138           3     Species:'Other'       1          NaN          NaN   
139           3      Species:setosa       2          NaN          NaN   
140           3  Species:versicolor       3          NaN          NaN   

         label  count  
0       1:1.59     25  
1    1.59:2.18     11  
2    2.18:2.77      0  
3    2.77:3.36      3  
...        ...    ...
137          :      0  
138          :      0  
139          :     36  
140          :      0  

[141 rows x 7 columns]

Rules: 

    cluster.id  rhs.support  rhs.conf  lhr.support  lhs.conf       lhs.var  \
0            1          104  1.000000           93  0.892157   Sepal_Width   
1            1          104  1.000000           93  0.892157   Sepal_Width   
2            1          104  1.000000           99  0.892157  Petal_Length   
3            1          104  1.000000           99  0.892157  Petal_Length   
...        ...          ...       ...          ...       ...           ...
26           3           36  0.346154           36  0.972222  Petal_Length 
27           3           36  0.346154           36  0.972222  Sepal_Length   
28           3           36  0.346154           36  0.972222  Sepal_Length   
29           3           36  0.346154           36  0.972222       Species   

    lhs.var.support  lhs.var.conf              predicate  
0                93      0.400000    Sepal_Width <= 3.92  
1                93      0.400000     Sepal_Width > 2.48  
2                93      0.222222   Petal_Length <= 6.31  
3                93      0.222222      Petal_Length >= 1  
...             ...           ...                    ...  
26               35      0.134398      Petal_Length >= 1  
27               35      0.094194    Sepal_Length <= 5.74  
28               35      0.094194     Sepal_Length >= 4.3  
29               35      0.281684        Species = setosa 

[30 rows x 9 columns]

>>> # Use the model to make predictions on the test data.
... em_mod.predict(test_dat)
    CLUSTER_ID
0            3
1            3
2            3
3            3
...        ...
42           2
43           2
44           2
45           2

>>> # Make predictions and return the probability for each class
... # on new data.
>>> em_mod.predict_proba(test_dat, 
...   supplemental_cols = test_dat[:, 
...     ['Sepal_Length', 'Sepal_Width', 
...      'Petal_Length']]).sort_values(by = ['Sepal_Length', 
...        'Sepal_Width',  'Petal_Length', 
...        'PROBABILITY_OF_2', 'PROBABILITY_OF_3'])
    Sepal_Length  Sepal_Width  Petal_Length  PROBABILITY_OF_2  \
0            4.4          3.0           1.3      4.680788e-20   
1            4.4          3.2           1.3      1.052071e-20   
2            4.5          2.3           1.3      7.751240e-06  
3            4.8          3.4           1.6      5.363418e-19   
...          ...          ...           ...               ...   
43           6.9          3.1           4.9      1.000000e+00   
44           6.9          3.1           5.4      1.000000e+00   
45           7.0          3.2           4.7      1.000000e+00   

    PROBABILITY_OF_3  
0       1.000000e+00  
1       1.000000e+00  
2       9.999922e-01  
3       1.000000e+00  
...              ...  
43     3.295578e-97  
44    6.438740e-137 
45     3.853925e-89  
  
>>> 
>>> # Change the random seed and refit the model.
... em_mod.set_params(EMCS_RANDOM_SEED = '5').fit(train_dat)

Algorithm Name: Expectation Maximization

Mining Function: CLUSTERING

Settings: 
                    setting name                  setting value
0                      ALGO_NAME  ALGO_EXPECTATION_MAXIMIZATION
1              CLUS_NUM_CLUSTERS                              2
2        EMCS_CLUSTER_COMPONENTS       EMCS_CLUSTER_COMP_ENABLE
3        EMCS_CLUSTER_STATISTICS         EMCS_CLUS_STATS_ENABLE
4            EMCS_CLUSTER_THRESH                              2
5          EMCS_LINKAGE_FUNCTION            EMCS_LINKAGE_SINGLE
6       EMCS_LOGLIKE_IMPROVEMENT                           .001
7           EMCS_MAX_NUM_ATTR_2D                             50
8      EMCS_MIN_PCT_ATTR_SUPPORT                             .1
9              EMCS_MODEL_SEARCH      EMCS_MODEL_SEARCH_DISABLE
10           EMCS_NUM_COMPONENTS                             20
11         EMCS_NUM_DISTRIBUTION          EMCS_NUM_DISTR_SYSTEM
12       EMCS_NUM_EQUIWIDTH_BINS                             11
13           EMCS_NUM_ITERATIONS                            100
14          EMCS_NUM_PROJECTIONS                             50
15              EMCS_RANDOM_SEED                              5
16        EMCS_REMOVE_COMPONENTS       EMCS_REMOVE_COMPS_ENABLE
17                  ODMS_DETAILS                    ODMS_ENABLE
18  ODMS_MISSING_VALUE_TREATMENT        ODMS_MISSING_VALUE_AUTO
19                 ODMS_SAMPLING          ODMS_SAMPLING_DISABLE
20                     PREP_AUTO                             ON

Computed Settings: 
                 setting name             setting value
0       EMCS_ATTRIBUTE_FILTER  EMCS_ATTR_FILTER_DISABLE
1  EMCS_CONVERGENCE_CRITERION        EMCS_CONV_CRIT_BIC
2      EMCS_NUM_QUANTILE_BINS                         3
3          EMCS_NUM_TOPN_BINS                         3

Global Statistics: 
       attribute name  attribute value
0           CONVERGED              YES
1       LOGLIKELIHOOD         -1.75777
2        NUM_CLUSTERS                2
3      NUM_COMPONENTS                9
4            NUM_ROWS              104
5         RANDOM_SEED                5
6  REMOVED_COMPONENTS               11

Attributes: 
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species

Partition: NO

Clusters: 

   CLUSTER_ID  CLUSTER_NAME  RECORD_COUNT  PARENT  TREE_LEVEL  LEFT_CHILD_ID  \
0           1             1           104     NaN           1            2.0   
1           2             2            36     1.0           2            NaN   
2           3             3            68     1.0           2            NaN   

   RIGHT_CHILD_ID  
0             3.0  
1             NaN  
2             NaN  

Taxonomy: 

   PARENT_CLUSTER_ID  CHILD_CLUSTER_ID
0                  1               2.0
1                  1               3.0
2                  2               NaN
3                  3               NaN

Centroids: 

    CLUSTER_ID ATTRIBUTE_NAME      MEAN  MODE_VALUE  VARIANCE
0            1   Petal_Length  3.721154        None  3.234694
1            1    Petal_Width  1.155769        None  0.567539
2            1   Sepal_Length  5.831731        None  0.753255
3            1    Sepal_Width  3.074038        None  0.221358
4            1        Species       NaN      setosa       NaN
5            2   Petal_Length  1.488889        None  0.033016
6            2    Petal_Width  0.250000        None  0.012857
7            2   Sepal_Length  5.011111        None  0.113016
8            2    Sepal_Width  3.488889        None  0.134159
9            2        Species       NaN      setosa       NaN
10           3   Petal_Length  4.902941        None  0.860588
11           3    Petal_Width  1.635294        None  0.191572
12           3   Sepal_Length  6.266176        None  0.545555
13           3    Sepal_Width  2.854412        None  0.128786
14           3        Species       NaN  versicolor       NaN

Leaf Cluster Counts: 

   CLUSTER_ID  CNT
0           2   36
1           3   68

Attribute Importance: 

  ATTRIBUTE_NAME  ATTRIBUTE_IMPORTANCE_VALUE  ATTRIBUTE_RANK
0   Petal_Length                    0.558311               2
1    Petal_Width                    0.556300               3
2   Sepal_Length                    0.469978               4
3    Sepal_Width                    0.196211               5
4        Species                    0.612463               1

Components: 

   COMPONENT_ID  CLUSTER_ID  PRIOR_PROBABILITY
0             1           2           0.113452
1             2           2           0.105727
2             3           3           0.114202
3             4           3           0.086285
4             5           3           0.067294
5             6           2           0.124365
6             7           3           0.126975
7             8           3           0.105761
8             9           3           0.155939

Cluster Hists: 

     cluster.id            variable  bin.id  lower.bound  upper.bound  \
0             1        Petal_Length       1         1.00         1.59   
1             1        Petal_Length       2         1.59         2.18   
2             1        Petal_Length       3         2.18         2.77   
3             1        Petal_Length       4         2.77         3.36   
...         ...                 ...     ...          ...          ...
137           3         Sepal_Width      11          NaN          NaN   
138           3     Species:'Other'       1          NaN          NaN   
139           3      Species:setosa       3          NaN          NaN   
140           3  Species:versicolor       2          NaN          NaN   

         label  count  
0       1:1.59     25 
1    1.59:2.18     11  
2    2.18:2.77      0  
3    2.77:3.36      3  
...        ...    ...
137          :      0 
138          :     33  
139          :      0  
140          :     35  

[141 rows x 7 columns]

Rules: 

    cluster.id  rhs.support  rhs.conf  lhr.support  lhs.conf       lhs.var  \
0            1          104  1.000000           93  0.894231   Sepal_Width   
1            1          104  1.000000           93  0.894231   Sepal_Width   
2            1          104  1.000000           99  0.894231  Petal_Length   
3            1          104  1.000000           99  0.894231  Petal_Length   
...        ...          ...       ...          ...       ...           ...
26           3           68  0.653846           68  0.955882  Sepal_Length   
27           3           68  0.653846           68  0.955882  Sepal_Length   
28           3           68  0.653846           68  0.955882       Species
29           3           68  0.653846           68  0.955882       Species 

    lhs.var.support  lhs.var.conf              predicate  
0                93      0.400000    Sepal_Width <= 3.92  
1                93      0.400000     Sepal_Width > 2.48  
2                93      0.222222   Petal_Length <= 6.31  
3                93      0.222222      Petal_Length >= 1  
...             ...           ...                    ...  
26               65      0.026013    Sepal_Length <= 7.9  
27               65      0.026013    Sepal_Length > 4.66
28               65      0.125809     Species IN 'Other'
29               65      0.125809  Species IN versicolor

Parent topic: OML4Py Classes That Provide Access to In-Database Machine Learning Algorithms