Changes in This Release for Oracle Machine Learning for R
This section details the modifications and additions made to Oracle Machine Learning for R (OML4R) in the current release.
- New Features in 23ai
Oracle Machine Learning for R: new features in Oracle Database 23ai.
New Features in 23ai
Oracle Machine Learning for R: new features in Oracle Database 23ai.
Algorithm Enhancements
Note:
New Algorithm Settings: You can find model settings and algorithm specific settings in Oracle Database PL/SQL Packages and Types Reference guide. See Oracle Database PL/SQL Packages and Types Reference guide.-
The
ore.odmNN
class represents and allows you to create a Neural Network (NN) model for classification and regression. This class replacesore.neural
by exposing the in-database neural network algorithm. Neural Network models can be used to capture intricate nonlinear relationships between inputs and outputs or to find patterns in data. See Neural Network Model. -
The
ore.odmRF
class represents and allows you to create a Random Forest (RF) model that provides an ensemble learning technique for classification. This class replacesore.randomForest
by exposing the in-database random forest algorithm. See Random Forest Model. -
The
ore.odmESM
class represents and allows you to create Exponential Smoothing Model (ESM) algorithm to time series forecasting. This class replacesore.esm
by exposing the in-database exponential smoothing algorithm. See Exponential Smoothing Model. -
The
ore.odmXGB
class represents and allows you to create XGBoost models using the scalable gradient tree boosting system that supports classification, regression, and survival analysis. See XGBoost Model. -
Supports additional Generalized Linear Model (GLM) link functions for logistic regression. In addition to Logit, new link functions include Probit, Cloglog, and Cauchit. See Generalized Linear Models.
The following settings allow the user to specify the link function for building a GLM model. The link functions are specific to the mining function.
For classification, the following settings are applicable:
- GLMS_LOGIT_LINK (default)
- GLMS_PROBIT_LINK
- GLMS_CLOGLOG_LINK
- GLMS_CAUCHIT_LINK
For regression, the following setting is applicable: GLMS_IDENTITY_LINK (default)
See Table 6-10.
-
XGBoost support for constraints and survival analysis
XGBoost supports monotonic and interaction constraints, as well as the AFT model for survival analysis. See XGBoost Model.
The following new settings are added for XGBoost support for constraints and survival analysis:
Note:
The XGBoost settings are case sensitive.xgboost_interaction_constraints
xgboost_decrease_constraints
xgboost_increase_constraints
objective
:survival:aft
xgboost_aft_loss_distribution
xgboost_aft_loss_distribution_scale
xgboost_aft_right_bound_column_name
-
Supports embeddings for the Explicit Semantic Analysis (ESA) algorithm, where you can use ESA models to generate embeddings for text and other ESA input. This functionality is equivalent to doc2vec (document to vector representation). See Explicit Semantic Analysis.
The following new settings are added to support Explicit Semantic Analysis embeddings:
ESAS_EMBEDDINGS
: when enabled, generates embeddings during scoring for feature extraction models.ESAS_EMBEDDING_SIZE
: specifies the size of the vectors representing embeddings.
See Table 6-7.
-
You can use the multiple time series feature of the Exponential Smoothing algorithm to prepare data for building time series regression models. See Exponential Smoothing Model.
The following setting is added to added to identify the frequency of outliers in the training data:
EMCS_OUTLIER_RATE
. See Table 6-6.The following new settings are added to support enhanced time series forecasting:
EXSM_SERIES_LIST
: setting allows you to forecast up to twenty predictor series in addition to the target series.EXSM_INITVL_OPTIMIZE
: determines whether initial values are optimized during model build.
- k-Means
The following new setting is added to restrict the data in a window size of six standard deviations around the mean:
KMNS_WINSORIZE
See Table 6-11.
-
Automated Time Series Model Search
Enables the Exponential Smoothing algorithm to select the best model type automatically when you do not specify
EXSM_MODEL
setting. This can lead to more accurate forecasting. The algorithm searches for an acceptable time series model automatically. See Exponential Smoothing Model.
General Enhancements
You can find model settings and algorithm specific settings in Oracle Database PL/SQL Packages and Types Reference guide.
- New Shared Settings
The following new settings are added for shared settings:
-
ODMS_BOXCOX
: this setting enables the Box-Cox variance-stabilization transformation. -
ODMS_EXPLOSION_MIN_SUPP
: introduced more efficient explosion data preparation for high cardinality categorical columns, which is data driven. You can define minimum support required for the categorical values in explosion mapping.
See Table 6-3.
-
-
The in-database models now record the query string within the model's metadata that was used to specify the build data. The
build_source
parameter in theall_mining_models, user_mining_models, and dba_mining_models
views enable you to see the data query used to produce the model. SeeALL_MINING_MODELS
. - Improved Performance of Partitioned Models
Performance of partitioned models with high number of partitions and dropping individual models within partition model is improved. See Partitioned Model.
- 4K Columns in Table
The database tables can now accommodate up to 4,096 columns. This functionality is referred to as Wide Tables. To enable or disable Wide Tables for your Oracle database, you can use the
MAX_COLUMNS
parameter. See MAX_COLUMNS.