9.20 Exponential Smoothing Method
The oml.esm
function uses the Exponential Smoothing Method (ESM) algorithm to create a time series model.
Exponential Smoothing Methods have been widely used in forecasting for over half a century. It has applications at the strategic, tactical, and operation level. For example, at a strategic level, forecasting is used for projecting return on investment, growth and the effect of innovations. At a tactical level, forecasting is used for projecting costs, inventory requirements, and customer satisfaction. At an operational level, forecasting is used for setting targets and predicting quality and conformance with standards.
In its simplest form, Exponential Smoothing is a moving average method with a single parameter that models an exponentially decreasing effect of past levels on future values. With a variety of extensions, Exponential Smoothing covers a broader class of models than other well-known approaches, such as the Box-Jenkins auto-regressive integrated moving average (ARIMA) approach. Oracle Machine Learning implements Exponential Smoothing using a state-of-the-art state space method that incorporates a single source of error (SSOE) assumption that provides theoretical and performance advantages.
Settings for an ESM model
The following table lists settings for ESM models.
Table 9-18 ESM Model Settings
Setting Name | Setting Value | Description |
---|---|---|
EXSM_MODEL |
It can take value in set {EXSM_SIMPLE, EXSM_SIMPLE_MULT, EXSM_HOLT, EXSM_HOLT_DMP, EXSM_MUL_TRND, EXSM_MULTRD_DMP, EXSM_SEAS_ADD, EXSM_SEAS_MUL, EXSM_HW, EXSM_HW_DMP, EXSM_HW_ADDSEA, EXSM_DHW_ADDSEA, EXSM_HWMT, EXSM_HWMT_DMP} |
This setting specifies the model.
The default value is |
|
|
This setting specifies a positive integer value as the length of seasonal cycle. The value specified must be larger than This setting is only applicable and must be provided for models with seasonality, otherwise the model throws an error. When |
|
It can take value in set {EXSM_INTERVAL_YEAR, EXSM_INTERVAL_QTR, EXSM_INTERVAL_MONTH,EXSM_INTERVAL_WEEK, EXSM_INTERVAL_DAY, EXSM_INTERVAL_HOUR, EXSM_INTERVAL_MIN,EXSM_INTERVAL_SEC} |
This setting only applies and must be provided when the time column ( The model throws an error if the time column of input table is of datetime type and setting The model throws an error if the time column of input table is of oracle number type and setting |
|
It can take value in set {EXSM_ACCU_TOTAL, EXSM_ACCU_STD, EXSM_ACCU_MAX, EXSM_ACCU_MIN, EXSM_ACCU_AVG, EXSM_ACCU_MEDIAN, EXSM_ACCU_COUNT} |
This setting only applies and must be provided when the time column has datetime type. It specifies how to generate the value of the accumulated time series from the input time series. |
|
It can also specify an option taking value in set {EXSM_MISS_MIN, EXSM_MISS_MAX, EXSM_MISS_AVG, EXSM_MISS_MEDIAN, EXSM_MISS_LAST, EXSM_MISS_FIRST, EXSM_MISS_PREV, EXSM_MISS_NEXT, EXSM_MISS_AUTO}. |
This setting specifies how to handle missing values, which may come from input data and/or the accumulation process of time series. You can specify either a number or an option. If a number is specified, all the missing values are set to that number.
If this setting is not provided, |
|
It must be set to a number between 1-30. |
This setting specifies how many steps ahead the predictions are to be made. If it is not set, the default value is |
|
It must be a number between 0 and 1, exclusive. |
This setting specifies the desired confidence level for prediction. The lower and upper bounds of the specified confidence interval is reported. If this setting is not specified, the default confidence level is |
EXSM_OPT_CRITERION |
It takes value in set {EXSM_OPT_CRIT_LIK, EXSM_OPT_CRIT_MSE, EXSM_OPT_CRIT_AMSE, EXSM_OPT_CRIT_SIG, EXSM_OPT_CRIT_MAE}. |
This setting specifies the desired optimization criterion. The optimization criterion is useful as a diagnostic for comparing models' fit to the same data.
The default value is |
|
positive integer |
This setting specifies the length of the window used in computing the error metric average mean square error (AMSE). |
Example 9-20 Using the oml.esm Class
This example creates an ESM model and uses some of the methods of the oml.esm
class.
import oml
import pandas as pd
df = pd.DataFrame({'EVENT': ['A', 'B', 'C', 'D'],
'START': ['2021-10-04 13:29:00', '2021-10-07 12:30:00',
'2021-10-15 04:20:00', '2021-10-18 15:45:03'],
'END': ['2021-10-08 11:29:06', '2021-10-15 10:30:07',
'2021-10-29 05:50:15', '2021-10-22 15:40:03']})
df['START'] = pd.to_datetime(df['START'])
df['END'] = pd.to_datetime(df['END'])
df['DURATION'] = df['END'] - df['START']
df['HOURS'] = df['DURATION'] / pd.Timedelta(hours=1)
df['MINUTES'] = df['DURATION'] / pd.Timedelta(minutes=1)
#For on-premises database follow the below command to connect to the database#
oml.connect("<username>","<password>", dsn="<dsn>")
dat = oml.create(df, table='DF')
train_x = dat[:, 1]
train_y = dat[:, 4]
setting = {'EXSM_INTERVAL':'EXSM_INTERVAL_DAY'}
esm_mod = oml.esm(**setting).fit(train_x, train_y, time_seq = 'START')
esm_mod
train_x = dat[:, 4]
train_y = dat[:, 5]
esm_mod = oml.esm().fit(train_x, train_y, time_seq = 'HOURS')
esm_mod
Listing for This Example
Create pandas DataFrame with start and end dates for an event. Convert start and end date columns to datetime, and create new columns that contain timedelta between the start and end dates. Convert timedelta into total number of hours and convert timedelta into total number of minutes.
>>> import oml
>>> import pandas as pd
>>> df = pd.DataFrame({'EVENT': ['A', 'B', 'C', 'D'],
'START': ['2021-10-04 13:29:00', '2021-10-07 12:30:00',
'2021-10-15 04:20:00', '2021-10-18 15:45:03'],
'END': ['2021-10-08 11:29:06', '2021-10-15 10:30:07',
'2021-10-29 05:50:15', '2021-10-22 15:40:03']})
>>> df['START'] = pd.to_datetime(df['START'])
>>> df['END'] = pd.to_datetime(df['END'])
>>> df['DURATION'] = df['END'] - df['START']
>>> df['HOURS'] = df['DURATION'] / pd.Timedelta(hours=1)
>>> df['MINUTES'] = df['DURATION'] / pd.Timedelta(minutes=1)
>>> #For on-premises database follow the below command to connect to the database#
>>> oml.connect("<username>","<password>", dsn="<dsn>")
>>> dat = oml.create(df, table='DF')
Using Datetime type
>>> train_x = dat[:, 1]
>>> train_y = dat[:, 4]
>>> setting = {'EXSM_INTERVAL':'EXSM_INTERVAL_DAY'}
>>> esm_mod = oml.esm(**setting).fit(train_x, train_y, time_seq = 'START')
>>> esm_mod
Algorithm Name: Exponential Smoothing
Mining Function: TIME_SERIES
Target: HOURS
Settings:
setting name setting value
0 ALGO_NAME ALGO_EXPONENTIAL_SMOOTHING
1 EXSM_ACCUMULATE EXSM_ACCU_TOTAL
2 EXSM_CONFIDENCE_LEVEL .95
3 EXSM_INTERVAL EXSM_INTERVAL_DAY
4 EXSM_NMSE 3
5 EXSM_OPTIMIZATION_CRIT EXSM_OPT_CRIT_LIK
6 EXSM_PREDICTION_STEP 1
7 EXSM_SETMISSING EXSM_MISS_AUTO
8 ODMS_BOXCOX ODMS_BOXCOX_ENABLE
9 ODMS_DETAILS ODMS_ENABLE
10 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
11 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
12 PREP_AUTO ON
Computed Settings:
setting name setting value
0 EXSM_MODEL EXSM_SIMPLE
Global Statistics:
attribute name attribute value
0 -2 LOG-LIKELIHOOD -21.1618
1 AIC 48.3236
2 AICC None
3 ALPHA 0.000100034
4 ALPHA DISC 0.9999
5 AMSE 12175.3
6 BIC 46.4825
7 CONVERGED YES
8 INITIAL ALPHA 0.000100034
9 INITIAL LEVEL 179.353
10 MAE 84.403
11 MSE 9843.9
12 NUM_ROWS 4
13 SIGMA 140.313
14 STD 140.313
Attributes:
Partition: NO
Prediction:
TIME_SEQ VALUE PREDICTION LOWER UPPER
0 2021-10-04 94.001667 179.352705 NaN NaN
1 2021-10-07 190.001944 179.344167 NaN NaN
2 2021-10-15 337.504167 179.345233 NaN NaN
3 2021-10-18 95.916667 179.361069 NaN NaN
4 2021-10-19 NaN 179.352712 -95.656158 454.361582
Using Float type
>>> train_x = dat[:, 4]
>>> train_y = dat[:, 5]
>>> esm_mod = oml.esm().fit(train_x, train_y, time_seq = 'HOURS')
>>> esm_mod
Algorithm Name: Exponential Smoothing
Mining Function: TIME_SERIES
Target: MINUTES
Settings:
setting name setting value
0 ALGO_NAME ALGO_EXPONENTIAL_SMOOTHING
1 EXSM_CONFIDENCE_LEVEL .95
2 EXSM_NMSE 3
3 EXSM_OPTIMIZATION_CRIT EXSM_OPT_CRIT_LIK
4 EXSM_PREDICTION_STEP 1
5 EXSM_SETMISSING EXSM_MISS_AUTO
6 ODMS_BOXCOX ODMS_BOXCOX_ENABLE
7 ODMS_DETAILS ODMS_ENABLE
8 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
9 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
10 PREP_AUTO ON
Computed Settings:
setting name setting value
0 EXSM_MODEL EXSM_HOLT
Global Statistics:
attribute name attribute value
0 -2 LOG-LIKELIHOOD 4.47424
1 AIC 1.05153
2 AICC None
3 ALPHA 0.000104161
4 AMSE 0.0190133
5 BETA 0.000104153
6 BIC -2.017
7 CONVERGED YES
8 INITIAL LEVEL 8.00977
9 INITIAL TREND 0.452033
10 LAMBDA 4.08563e-05
11 MAE 1175.53
12 MSE 0.0266914
13 NUM_ROWS 4
14 SIGMA 0.188649
15 STD 0.188649
Attributes:
Partition: NO
Prediction:
TIME_SEQ VALUE PREDICTION LOWER UPPER
0 94 5640.100000 4807.666451 NaN NaN
1 95 5755.000000 7554.329741 NaN NaN
2 190 11400.116667 11869.239245 NaN NaN
3 337 20250.250000 18649.004898 NaN NaN
4 338 NaN 29301.840039 19894.31833 41663.104953