11.5.3 Run a User-Defined Python Function on the Specified Data
Use the oml.table_apply
function to run a Python function on data that you specify with the data
parameter.
The oml.table_apply
function runs a user-defined Python function in a Python engine spawned and managed by the database environment. With the func
parameter, you can supply a Python function or you can specify the name of a user-defined Python function in the OML4Py script repository.
The syntax of the function is the following:
oml.table_apply(data, func, func_owner=None, graphics=False, **kwargs)
The data
argument is an oml.DataFrame
that contains the data that the func
function operates on.
The func
argument is the function to run. It may be one of the following:
-
A Python function
-
A string that is the name of a user-defined Python function in the OML4Py script repository
- A string that defines a Python function
-
An
oml.script.script.Callable
object returned by theoml.script.load
function
The optional func_owner
argument is a string or None
(the default) that specifies the owner of the registered user-defined Python function when argument func
is a registered user-defined Python function name.
The graphics
argument is a boolean that specifies whether to look for images. The default value is False
.
With the **kwargs
parameter, you can pass additional arguments to the func
function. Special control arguments, which start with oml_
, are not passed to the function specified by func
, but instead control what happens before or after the execution of the function.
The oml.table_apply
function returns a Python object or an oml.embed.data_image._DataImage
. If no image is rendered in the user-defined Python function, oml.table_apply
returns whatever Python object is returned by the function. Otherwise, it returns an oml.embed.data_image._DataImage
object.
Example 11-7 Using the oml.table_apply Function
This example builds a regression model using in-memory data, and then uses the oml.table_apply
function to predict using the model on the first 10 rows of the IRIS table.
import oml
import pandas as pd
from sklearn import datasets
from sklearn import linear_model
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
# Drop the IRIS database table if it exists.
try:
oml.drop('IRIS')
except:
pass
# Create the IRIS database table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
# Build a regression model using in-memory data.
iris = oml_iris.pull()
regr = linear_model.LinearRegression()
regr.fit(iris[['Sepal_Width', 'Petal_Length', 'Petal_Width']],
iris[['Sepal_Length']])
regr.coef_
# Use oml.table_apply to predict using the model on the first 10
# rows of the IRIS table.
def predict(dat, regr):
import pandas as pd
pred = regr.predict(dat[['Sepal_Width', 'Petal_Length',
'Petal_Width']])
return pd.concat([dat,pd.DataFrame(pred)], axis=1)
res = oml.table_apply(data=oml_iris.head(n=10),
func=predict, regr=regr)
res
Listing for This Example
>>> import oml
>>> import pandas as pd
>>> from sklearn import datasets
>>> from sklearn import linear_model
>>>
>>> # Load the iris data set and create a pandas.DataFrame for it.
... iris = datasets.load_iris()
>>>
>>> x = pd.DataFrame(iris.data,
... columns = ['Sepal_Length','Sepal_Width',
... 'Petal_Length','Petal_Width'])
>>> y = pd.DataFrame(list(map(lambda x:
... {0: 'setosa', 1: 'versicolor',
... 2:'virginica'}[x], iris.target)),
... columns = ['Species'])
>>>
>>> # Drop the IRIS database table if it exists.
... try:
... oml.drop('IRIS')
... except:
... pass
>>>
>>> # Create the IRIS database table.
... oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
>>>
>>> # Build a regression model using in-memory data.
... iris = oml_iris.pull()
>>> regr = linear_model.LinearRegression()
>>> regr.fit(iris[['Sepal_Width', 'Petal_Length', 'Petal_Width']],
... iris[['Sepal_Length']])
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
>>> regr.coef_
array([[ 0.65083716, 0.70913196, -0.55648266]])
>>>
>>> # Use oml.table_apply to predict using the model on the first 10
... # rows of the IRIS table.
... def predict(dat, regr):
... import pandas as pd
... pred = regr.predict(dat[['Sepal_Width', 'Petal_Length',
... 'Petal_Width']])
... return pd.concat([dat,pd.DataFrame(pred)], axis=1)
...
>>> res = oml.table_apply(data=oml_iris.head(n=10),
... func=predict, regr=regr)
>>> res Sepal_Length Sepal_Width Petal_Length Petal_Width
0 4.6 3.6 1 0.2
1 5.1 2.5 3 1.1
2 6.0 2.2 4 1.0
3 5.8 2.6 4 1.2
4 5.5 2.3 4 1.3
5 5.5 2.5 4 1.3
6 6.1 2.8 4 1.3
7 5.7 2.5 5 2.0
8 6.0 2.2 5 1.5
9 6.3 2.5 5 1.9
Species 0
0 setosa 4.796847
1 versicolor 4.998355
2 versicolor 5.567884
3 versicolor 5.716923
4 versicolor 5.466023
5 versicolor 5.596191
6 virginica 5.791442
7 virginica 5.915785
8 virginica 5.998775
9 virginica 5.971433
Parent topic: Python API for Embedded Python Execution