8.1.5.1 Define the Target Variable

The target variable (y), also known as the dependent variable or label, is the outcome that a machine learning model aims to predict or classify.

The target variable (y) can be an OML object or a string.
  • If target variable (y) is a single-column OML object, the target values specified by y must be compatible with the input data (x), meaning they should have the same structure (e.g., the same number of rows).
  • If target variable (y) is a string, it represents the name of the column in x that contains the target values (labels) for the model. In this case, x is expected to include a column with the name specified by y, which will be used as the target for training or prediction.

Example 8-9 Target Variable as an OML Object or a String

This example demonstrates the two different ways to specify the target variable.

import oml
import pandas as pd
from sklearn import datasets 

# Create training and test datasets with the target variable specified as an OML object.
dat = oml.sync(table = 'IRIS').split()
train_x = dat[0].drop('Species')
train_y = dat[0]['Species']
test_dat = dat[1]

# Create training and test datasets with the target variable specified as a string.
dat = oml.sync(table = 'IRIS').split()
train_dat = dat[0]
train_y = 'Species'
test_dat = dat[1]