Automatic Data Preparation
Machine learning models often require data transformations before training. Oracle Machine Learning (OML) automates this process using Automatic Data Preparation (ADP). ADP applies to OML4SQL, OML4Py, and OML4R in-database models, making data transformation easier.
When ADP is enabled, Oracle Machine Learning applies transformations based on the algorithm’s needs. These transformations include:
- Binning: Grouping numerical values into ranges.
- Normalization: Scaling values to a common range.
- Handling missing or sparse data: Managing gaps in data sets.
ADP embeds these transformations in the model along with any user-specified transformation instruction, ensuring they are applied whenever new data is processed. Oracle Machine Learning follows consistent heuristics to determine the best transformations for an algorithm. This approach helps achieve reasonable model quality in most cases.
You can:
- Use automatic transformations provided by ADP.
- Define custom transformations to fit your data needs.
- Manually handle transformations using database functions.
You can customize data preparation for:
- OML4SQL: Use the
DBMS_DATA_MINING_TRANSFORM
PL/SQL package. - OML4Py: Specify transformations using a model settings list
(
params
). - OML4R: Use the
odm.settings
list or enable ADP directly (auto.data.prep=TRUE
).
OML offers several features that significantly simplify the process of data preparation:
-
Embedded data preparation: The transformations used in training the model are embedded in the model and automatically run whenever the model is applied to new data. If you specify transformations for the model, you only have to specify them once.
-
Automatic management of missing values and sparse data: Oracle Machine Learning uses consistent methodology across machine learning algorithms to handle sparsity and missing values.
-
Transparency: Oracle Machine Learning provides model details, which are a view of the attributes that are internal to the model. This insight into the inner details of the model is possible because of reverse transformations, which map the transformed attribute values to a form that can be interpreted by a user. Where possible, attribute values are reversed to the original column values. Reverse transformations are also applied to the target of a supervised model, thus the results of scoring are in the same units as the units of the original target.
-
Tools for custom data preparation: Oracle Machine Learning provides many common transformation routines, for example, in OML4SQL, the
DBMS_DATA_MINING_TRANSFORM
PL/SQL package. You can use these routines, or develop your own routines in SQL, or perform both. You can use custom transformation instructions instead of ADP or use it with ADP.
Parent topic: Features of In-Database Algorithms