Changes in This Release for Oracle Machine Learning for R

This section details the modifications and additions made to Oracle Machine Learning for R (OML4R) in the current release.

New Features in 23ai
Oracle Machine Learning for R: new features in Oracle Database 23ai.

New Features in 23ai

Oracle Machine Learning for R: new features in Oracle Database 23ai.

Algorithm Enhancements

Note:

New Algorithm Settings: You can find model settings and algorithm specific settings in Oracle Database PL/SQL Packages and Types Reference guide. See Oracle Database PL/SQL Packages and Types Reference guide.

Neural Network

The ore.odmNN class represents and allows you to create a Neural Network (NN) model for classification and regression. This class replaces ore.neural by exposing the in-database neural network algorithm. Neural Network models can be used to capture intricate nonlinear relationships between inputs and outputs or to find patterns in data. See Neural Network Model.
Random Forest

The ore.odmRF class represents and allows you to create a Random Forest (RF) model that provides an ensemble learning technique for classification. This class replaces ore.randomForest by exposing the in-database random forest algorithm. See Random Forest Model.
Exponential Smoothing Model

The ore.odmESM class represents and allows you to create Exponential Smoothing Model (ESM) algorithm to time series forecasting. This class replaces ore.esm by exposing the in-database exponential smoothing algorithm. See Exponential Smoothing Model.
XGBoost Model

The ore.odmXGB class represents and allows you to create XGBoost models using the scalable gradient tree boosting system that supports classification, regression, and survival analysis. See XGBoost Model.
GLM link functions

Supports additional Generalized Linear Model (GLM) link functions for logistic regression. In addition to Logit, new link functions include Probit, Cloglog, and Cauchit. See Generalized Linear Models.

The following settings allow the user to specify the link function for building a GLM model. The link functions are specific to the mining function.

For classification, the following settings are applicable:
- GLMS_LOGIT_LINK (default)
- GLMS_PROBIT_LINK
- GLMS_CLOGLOG_LINK
- GLMS_CAUCHIT_LINK
For regression, the following setting is applicable: GLMS_IDENTITY_LINK (default)

See Table 6-10.
XGBoost support for constraints and survival analysis

XGBoost supports monotonic and interaction constraints, as well as the AFT model for survival analysis. See XGBoost Model.

The following new settings are added for XGBoost support for constraints and survival analysis:

Note:
The XGBoost settings are case sensitive.
- xgboost_interaction_constraints
- xgboost_decrease_constraints
- xgboost_increase_constraints
- objective: survival:aft
- xgboost_aft_loss_distribution
- xgboost_aft_loss_distribution_scale
- xgboost_aft_right_bound_column_name
See Table 6-23.
Embeddings through ESA

Supports embeddings for the Explicit Semantic Analysis (ESA) algorithm, where you can use ESA models to generate embeddings for text and other ESA input. This functionality is equivalent to doc2vec (document to vector representation). See Explicit Semantic Analysis.

The following new settings are added to support Explicit Semantic Analysis embeddings:
- ESAS_EMBEDDINGS: when enabled, generates embeddings during scoring for feature extraction models.
- ESAS_EMBEDDING_SIZE: specifies the size of the vectors representing embeddings.
See Table 6-7.
Time Series Regression

You can use the multiple time series feature of the Exponential Smoothing algorithm to prepare data for building time series regression models. See Exponential Smoothing Model.

The following setting is added to added to identify the frequency of outliers in the training data: EMCS_OUTLIER_RATE. See Table 6-6.

The following new settings are added to support enhanced time series forecasting:
- EXSM_SERIES_LIST: setting allows you to forecast up to twenty predictor series in addition to the target series.
- EXSM_INITVL_OPTIMIZE: determines whether initial values are optimized during model build.
See Table 6-8.
k-Means
The following new setting is added to restrict the data in a window size of six standard deviations around the mean:KMNS_WINSORIZE

See Table 6-11.
Automated Time Series Model Search

Enables the Exponential Smoothing algorithm to select the best model type automatically when you do not specify EXSM_MODEL setting. This can lead to more accurate forecasting. The algorithm searches for an acceptable time series model automatically. See Exponential Smoothing Model.

General Enhancements

You can find model settings and algorithm specific settings in Oracle Database PL/SQL Packages and Types Reference guide.

New Shared Settings
The following new settings are added for shared settings:
- ODMS_BOXCOX: this setting enables the Box-Cox variance-stabilization transformation.
- ODMS_EXPLOSION_MIN_SUPP: introduced more efficient explosion data preparation for high cardinality categorical columns, which is data driven. You can define minimum support required for the categorical values in explosion mapping.
See Table 6-3.
Model Includes Data Lineage

The in-database models now record the query string within the model's metadata that was used to specify the build data. The build_source parameter in the all_mining_models, user_mining_models, and dba_mining_models views enable you to see the data query used to produce the model. See ALL_MINING_MODELS.
Improved Performance of Partitioned Models
Performance of partitioned models with high number of partitions and dropping individual models within partition model is improved. See Partitioned Model.
4K Columns in Table
The database tables can now accommodate up to 4,096 columns. This functionality is referred to as Wide Tables. To enable or disable Wide Tables for your Oracle database, you can use the MAX_COLUMNS parameter. See MAX_COLUMNS.

Parent topic: Changes in This Release for Oracle Machine Learning for R