Association Rules

9.8 Association Rules

The oml.ar class implements the Apriori algorithm to find frequent itemsets and association rules, all as part of an association model object.

The Apriori algorithm is efficient and scales well with respect to the number of transactions, number of items, and number of itemsets and rules produced.

Use the oml.ar class to identify frequent itemsets within large volumes of transactional data, such as in market basket analysis. The results of an association model are the rules that identify patterns of association within the data.

An association rule identifies a pattern in the data in which the appearance of a set of items in a transactional record implies another set of items. The groups of items used to form rules must pass a minimum threshold according to how often they occur (the support of the rule) and how often the consequent follows the antecedent (the confidence of the rule). Association models generate all rules that have support and confidence greater than user-specified thresholds.

Oracle Machine Learning does not support the scoring operation for association modeling.

For information on the oml.ar class attributes and methods, invoke help(oml.ar) or see Oracle Machine Learning for Python API Reference.

Settings for an Association Rules Model

The following table lists the settings applicable to association rules models.

Table 9-3 Association Rules Models Settings

Setting Name	Setting Value	Description
`ASSO_ABS_ERROR`	`0`<`ASSO_ABS_ERROR`≤`MAX(ASSO_MIN_SUPPORT, ASSO_MIN_CONFIDENCE)`	Specifies the absolute error for the association rules sampling. A smaller value of `ASSO_ABS_ERROR` obtains a larger sample size that gives accurate results but takes longer to compute. Set a reasonable value for `ASSO_ABS_ERROR`, such as the default value, to avoid too large a sample size. The default value is `0.5` * `MAX(ASSO_MIN_SUPPORT, ASSO_MIN_CONFIDENCE)`.
`ASSO_AGGREGATES`	`NULL`	Specifies the columns to aggregate. It is a comma separated list of strings containing the names of the columns for aggregation. The number of columns in the list must be <= 10. You can set `ASSO_AGGREGATES` if you have specified a column name with `ODMS_ITEM_ID_COLUMN_NAME`. The data table must have valid column names such as `ITEM_ID` and `CASE_ID` which are derived from `ODMS_ITEM_ID_COLUMN_NAME`. An item value is not mandatory. The default value is `NULL`. For each item, you may supply several columns to aggregate. However, doing so requires more memory to buffer the extra data and also affects performance because of the larger input data set and increased operations.
`ASSO_ANT_IN_RULES`	`NULL`	Sets Including Rules for the antecedent: it is a comma separated list of strings, at least one of which must appear in the antecedent part of each reported association rule. The default value is `NULL`.
`ASSO_ANT_EX_RULES`	`NULL`	Sets Excluding Rules for the antecedent: it is a comma separated list of strings, none of which can appear in the antecedent part of each reported association rule. The default value is `NULL`.
`ASSO_CONF_LEVEL`	`0`≤ `ASSO_CONF_LEVEL` ≤ `1`	Specifies the confidence level for an association rules sample. A larger value of `ASSO_CONF_LEVEL` obtains a larger sample size. Any value between `0.9` and `1` is suitable. The default value is `0.95`.
`ASSO_CONS_IN_RULES`	`NULL`	Sets Including Rules for the consequent: it is a comma separated list of strings, at least one of which must appear in the consequent part of each reported association rule. The default value is `NULL`.
`ASSO_CONS_EX_RULES`	`NULL`	Sets Excluding Rules for the consequent: it is a comma separated list of strings, none of which can appear in the consequent part of a reported association rule. You can use the excluding rule to reduce the data that must be stored, but you may be required to build extra models for executing different Including or Excluding Rules. The default value is `NULL`.
`ASSO_EX_RULES`	`NULL`	Sets Excluding Rules applied for each association rule: it is a comma separated list of strings that cannot appear in an association rule. No rule can contain any item in the list. The default value is `NULL`.
`ASSO_IN_RULES`	`NULL`	Sets Including Rules applied for each association rule: it is a comma separated list of strings, at least one of which must appear in each reported association rule, either as antecedent or as consequent The default value `NULL`, which specifies that filtering is not applied.
`ASSO_MAX_RULE_LENGTH`	`TO_CHAR( 2<=` `numeric_expr` `<=20)`	Maximum rule length for association rules. The default value is `4`.
`ASSO_MIN_CONFIDENCE`	`TO_CHAR( 0<=` `numeric_expr` `<=1)`	Minimum confidence for association rules. The default value is `0.1`.
`ASSO_MIN_REV_CONFIDENCE`	`TO_CHAR( 0<=` `numeric_expr` `<=1)`	Sets the Minimum Reverse Confidence that each rule should satisfy. The Reverse Confidence of a rule is defined as the number of transactions in which the rule occurs divided by the number of transactions in which the consequent occurs. The value is real number between 0 and 1. The default value is `0`.
`ASSO_MIN_SUPPORT`	`TO_CHAR( 0<=` `numeric_expr` `<=1)`	Minimum support for association rules. The default value is `0.1`.
`ASSO_MIN_SUPPORT_INT`	`TO_CHAR( 0<=` `numeric_expr` `<=1)`	Minimum absolute support that each rule must satisfy. The value must be an integer. The default value is `1`.
`ASSO_CONS_EX_RULES`
`ODMS_ITEM_ID_COLUMN_NAME`	column_name	The name of a column that contains the items in a transaction. When you specify this setting, the algorithm expects the data to be presented in native transactional format, consisting of two columns: Case ID, either categorical or numeric Item ID, either categorical or numeric
`ODMS_ITEM_VALUE_COLUMN_ NAME`	column_name	The name of a column that contains a value associated with each item in a transaction. Use this setting only when you have specified a value for `ODMS_ITEM_ID_COLUMN_NAME`, indicating that the data is presented in native transactional format. If you also use `ASSO_AGGREGATES`, then the build data must include the following three columns and the columns specified in the `AGGREGATES` setting. Case ID, either categorical or numeric Item ID, either categorical or numeric, specified by `ODMS_ITEM_ID_COLUMN_NAME` Item value, either categorical or numeric, specified by `ODMS_ITEM_VALUE_COLUMN_ NAME` If `ASSO_AGGREGATES`, Case ID, and Item ID columns are present, then the Item Value column may or may not appear. The Item Value column may specify information such as the number of items (for example, three apples) or the type of the item (for example, macintosh apples).

See Also:

Example 9-8 Using the oml.ar Class

This example uses methods of the oml.ar class.

import pandas as pd
from sklearn import datasets 
import oml

# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
                 columns = ['Sepal_Length','Sepal_Width',
                            'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
                           {0: 'setosa', 1: 'versicolor',
                            2:'virginica'}[x], iris.target)),
                 columns = ['Species']))

try:
    oml.drop('IRIS')
except: 
    pass

# Create the IRIS database table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')

# Create training data.
train_dat = oml.sync(table = 'IRIS')

# Specify settings.
setting = {'asso_min_support':'0.1', 'asso_min_confidence':'0.1'}

# Create an AR model object.
ar_mod = oml.ar(**setting)

# Fit the model according to the training data and parameter 
# settings.
ar_mod = ar_mod.fit(train_dat)

# Show details of the model.
ar_mod

Listing for This Example

>>> import pandas as pd
>>> from sklearn import datasets 
>>> import oml
>>>
>>> # Load the iris data set and create a pandas.DataFrame for it.
... iris = datasets.load_iris()
>>> x = pd.DataFrame(iris.data, 
...                  columns = ['Sepal_Length','Sepal_Width',
...                             'Petal_Length','Petal_Width'])
>>> y = pd.DataFrame(list(map(lambda x: 
...                            {0: 'setosa', 1: 'versicolor', 
...                             2:'virginica'}[x], iris.target)), 
...                  columns = ['Species'])
>>>
>>> try:
...    oml.drop('IRIS')
... except: 
...    pass
>>>
>>> # Create the IRIS database table.
... oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
>>>
>>> # Create training data.
... train_dat = oml.sync(table = 'IRIS')
>>>
>>> # Specify settings.
... setting = {'asso_min_support':'0.1', 'asso_min_confidence':'0.1'}
>>>
>>> # Create an AR model object.
... ar_mod = oml.ar(**setting)
>>> 
>>> # Fit the model according to the training data and parameter 
... # settings.
>>> ar_mod = ar_mod.fit(train_dat)
>>> 
>>> # Show details of the model.
... ar_mod

Algorithm Name: Association Rules

Mining Function: ASSOCIATION

Settings: 
                   setting name                   setting value
0                     ALGO_NAME  ALGO_APRIORI_ASSOCIATION_RULES
1          ASSO_MAX_RULE_LENGTH                               4
2           ASSO_MIN_CONFIDENCE                             0.1
3       ASSO_MIN_REV_CONFIDENCE                               0
4              ASSO_MIN_SUPPORT                             0.1
5          ASSO_MIN_SUPPORT_INT                               1
6                  ODMS_DETAILS                     ODMS_ENABLE
7  ODMS_MISSING_VALUE_TREATMENT         ODMS_MISSING_VALUE_AUTO
8                 ODMS_SAMPLING           ODMS_SAMPLING_DISABLE
9                     PREP_AUTO                              ON

Global Statistics: 
     attribute name  attribute value         
0     ITEMSET_COUNT         6.000000
1       MAX_SUPPORT         0.333333
2          NUM_ROWS       150.000000
3        RULE_COUNT         2.000000
4 TRANSACTION_COUNT       150.000000

Attributes: 
Petal_Length 
Petal_Width 
Sepal_Length 
Sepal_Width 
Species

Partition: NO

Itemsets: 

   ITEMSET_ID   SUPPORT  NUMBER_OF_ITEMS    ITEM_NAME          ITEM_VALUE
0           1  0.193333                1  Petal_Width  .20000000000000001
1           2  0.173333                1  Sepal_Width                   3
2           3  0.333333                1      Species              setosa
3           4  0.333333                1      Species          versicolor
4           5  0.333333                1      Species           virginica
5           6  0.193333                2  Petal_Width  .20000000000000001
6           6  0.193333                2      Species              setosa

Rules: 

   RULE_ID  NUMBER_OF_ITEMS     LHS_NAME           LHS_VALUE     RHS_NAME  \
0        1                2      Species              setosa  Petal_Width
1        2                2  Petal_Width  .20000000000000001      Species

  RHS_VALUE   SUPPORT  CONFIDENCE  REVCONFIDENCE  LIFT  
0      None  0.186667        0.58           1.00     3  
1      None  0.186667        1.00           0.58     3

Parent topic: OML4Py Classes That Provide Access to In-Database Machine Learning Algorithms