1.3 OML4Py 2.0 Updates
Oracle Machine Learning for Python: new features in Oracle Database 23ai.
Algorithm Enhancements
Note:
New Algorithm Settings: You can find model settings and algorithm specific settings in Oracle Database PL/SQL Packages and Types Reference guide. See Oracle Database PL/SQL Packages and Types Reference guide.
-
GLMS_LINK_FUNCTION
: this setting enables the user to specify the link function for building a generalized linear model. The additional link functions are: Logit, Probit, Cloglog, and Cauchit. See Generalized Linear Model. -
The following new settings are added for XGBoost support for constraints and survival analysis.
Note:
The XGBoost settings are case sensitive.- Interaction and Monotonic Constraints
xgboost_interaction_constraints
xgboost_decrease_constraints
xgboost_increase_constraints
- Support for Survival Analysis
objective
:survival:aft
xgboost_aft_loss_distribution
xgboost_aft_loss_distribution_scale
xgboost_aft_right_bound_column_name
- Interaction and Monotonic Constraints
-
Explicit Semantic Analysis (ESA)
The following settings are added to support generate embeddings through Explicit Semantic Analysis embeddings:
ESAS_EMBEDDINGS
: when enabled, generates embeddings during scoring for feature extraction models.ESAS_EMBEDDING_SIZE
: specifies the size of the vectors representing embeddings.
Supports embeddings for the Explicit Semantic Analysis (ESA) algorithm. ESA embeddings enables you to utilize ESA models to generate embeddings for any text or other ESA input. This functionality is equivalent to doc2vec (document to vector representation). See Explicit Semantic Analysis.
-
EMCS_OUTLIER_RATE
: identifies the frequency of outliers in the training data. See Expectation Maximization. -
New
settings for Exponential Smoothing to support Time Series regression models and
initial value optimization for model build:
-
Multiple time series
EXSM_SERIES_LIST
:setting enables you to forecast up to twenty predictor series in addition to the target series. -
Automated model type search
EXSM_INITVL_OPTIMIZE
: determines whether initial values are optimized during model build.
EXSM_MODEL
setting. This leads to more accurate forecasting. For details, see Exponential Smoothing Method. -
-
KMNS_WINSORIZE
: this setting restricts the data in a window size of six standard deviations around the mean. See k-Means.
General Enhancements
- New shared settings
-
ODMS_BOXCOX
: this setting enables the Box-Cox variance-stabilization transformation. -
ODMS_EXPLOSION_MIN_SUPP
: introduced more efficient data driven encoding for high cardinality categorical columns. You can define minimum support required for the categorical values in explosion mapping.
See Shared Settings.
-
- Convert Pretrained Models to ONNX Format
OML4Py enables the use of text transformers from Hugging Face by converting them into ONNX format models. OML4Py also adds the necessary tokenization and post-processing. The resulting ONNX pipeline is then imported into the database and can be used to generate embeddings for AI Vector Search. See ONNX Pipeline Models : Text Embedding.
-
In-database ML models now record the query string that was run to specify the build data within the model's metadata. The
build_source
parameter in theall/user/dba_mining_models
view enables users to know the data query used to produce the model. SeeALL_MINING_MODELS
. - Improved Performance of Partitioned Models
Performance of partitioned models with high number of partitions and dropping individual models within partition model is improved. To know more about partitioned models, see DDL in Partitioned model.
- 4k Columns in Table:
The database tables can now accommodate up to 4,096 columns. This functionality is referred to as Wide Tables. To enable or disable Wide Tables for your Oracle database, you can use the
MAX_COLUMNS
parameter. See MAX_COLUMNS.
Parent topic: Changes in This Release