7.3.5 Create a Persistent Database Table from a Python Data Set
Use the oml.create
function to create a persistent table in your database schema from data in your Python session.
The oml.create
function creates a table in the database schema and returns an oml.DataFrame
object that is a proxy for the table. The proxy oml.DataFrame
object has the same name as the table.
Note:
When creating a table in Oracle Machine Learning for Python, if you use lowercase or mixed case for the name of the table, then you must use the same lowercase or mixed case name in double quotation marks when using the table in a SQL query or function. If, instead, you use an all uppercase name when creating the table, then the table name is case-insensitive: you can use uppercase, lowercase, or mixed case when using the table without using double quotation marks. The same is true for naming columns in a table.You can delete the persistent table in a database schema with the oml.drop
function.
Caution:
Use theoml.drop
function to delete a persistent database table. Use the del
statement to remove an oml.DataFrame
proxy object and its associated temporary table; del
does not delete a persistent table.
The syntax of the oml.create
function is the following:
oml.create(x, table, oranumber=True, dbtypes=None, append=False)
The x
argument is a pandas.DataFrame
or a list of tuples of equal size that contain the data for the table. For a list of tuples, each tuple represents a row in the table and the column names are set to COL1, COL2, and so on. The table
argument is a string that specifies a name for the table.
The SQL data types of the columns are determined by the following:
-
OML4Py determines default column types by looking at 20 random rows sampled from the table. For tables with less than 20 rows, it uses all rows in determining the column type.
If the values in a column are all
None
, or if a column has inconsistent data types that are notNone
in the sampled rows, then a default column type cannot be determined and aValueError
is raised unless a SQL type for the column is specified by thedbtypes
argument. -
For numeric columns, the
oranumber
argument, which is abool
, determines the SQL data type. IfTrue
(the default), then the SQL data type is NUMBER. IfFalse
, then the data type is BINARY DOUBLE.If the data in
x
contains NaN values, then you should setoranumber
toFalse
. -
For string columns, the default type is VARCHAR2(4000).
-
For bytes columns, the default type is BLOB.
With the dbtypes
parameter, you can specify the SQL data types for the table columns. The values of dbtypes
may be either a dict
that maps str
to str
values or a list of str
values. For a dict
, the keys are the names of the columns. The dbtypes
parameter is ignored if the append
argument is True
.
The append
argument is a bool
that specifies whether to append the x
data to an existing table.
Example 7-13 Creating Database Tables from a Python Data Set
This example creates a cursor
object for the database connection, creates a pandas.core.frame.DataFrame
with columns of various data types, then creates a series of tables using different oml.create
parameters and shows the SQL data types of the table columns.
import oml
# Create a cursor object for the current OML4Py database
# connection to run queries and get information from the database.
cr = oml.cursor()
import pandas as pd
df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, 2],
'string' : [None, None, 'a', 'a', 'a', 'b'],
'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})
# Get the order of the columns
df.columns
# Create a table with the default parameters.
oml_df1 = oml.create(df, table = 'tbl1')
# Show the default SQL data types of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl1'")
cr.fetchall()
# Create a table with oranumber set to False.
oml_df2 = oml.create(df, table = 'tbl2', oranumber = False)
# Show the SQL data typea of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl2'")
cr.fetchall()
# Create a table with dbtypes specified as a dict mapping column names
# to SQL data types.
oml_df3 = oml.create(df, table = 'tbl3',
dbtypes = {'numeric': 'BINARY_DOUBLE',
'bytes':'RAW(1)'})
# Show the SQL data types of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl3'")
cr.fetchall()
# Create a table with dbtypes specified as a list of SQL data types
# matching the order of the columns.
oml_df4 = oml.create(df, table = 'tbl4',
dbtypes = ['BINARY_DOUBLE','VARCHAR2(2000)','RAW(1)'])
# Show the SQL data type of the columns.
_ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl4'")
cr.fetchall()
# Create a table from a list of tuples.
lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'),
(3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
oml_df5 = oml.create(lst, table = 'tbl5',
dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])
# Close the cursor
cr.close()
# Drop the tables.
oml.drop('tbl1')
oml.drop('tbl2')
oml.drop('tbl3')
oml.drop('tbl4')
oml.drop('tbl5')
Listing for This Example
>>> import oml
>>>
>>> # Create a cursor object for the current OML4Py database
... # connection to run queries and get information from the database.
... cr = oml.cursor()
>>>
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'numeric': [1, 1.4, -4, 3.145, 5, 2],
... 'string' : [None, None, 'a', 'a', 'a', 'b'],
... 'bytes' : [b'a', b'b', b'c', b'c', b'd', b'e']})
>>>
>>> # Get the order of the columns.
... df.columns
Index(['numeric', 'string', 'bytes'], dtype='object')
>>>
>>> # Create a table with the default parameters.
... oml_df1 = oml.create(df, table = 'tbl1')
>>>
>>> # Show the default SQL data types of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl1'")
>>> cr.fetchall()
[('NUMBER',), ('VARCHAR2',), ('BLOB',)]
>>>
>>> # Create a table with oranumber set to False.
... oml_df2 = oml.create(df, table = 'tbl2', oranumber = False)
>>>
>>> # Show the SQL data types of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl2'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('BLOB',)]
>>>
>>> # Create a table with dbtypes specified as a dict mapping column names
... # to SQL data types.
... oml_df3 = oml.create(df, table = 'tbl3',
... dbtypes = {'numeric': 'BINARY_DOUBLE',
... 'bytes':'RAW(1)'})
>>>
>>> # Show the SQL data type of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl3'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('RAW',)]
>>>
>>> # Create a table with dbtypes specified as a list of SQL data types
... # matching the order of the columns.
... oml_df4 = oml.create(df, table = 'tbl4',
... dbtypes = ['BINARY_DOUBLE','VARCHAR2(2000)', 'RAW(1)'])
>>>
>>> # Show the SQL data type of the columns
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl4'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('RAW',)]
>>>
>>> # Create a table from a list of tuples.
... lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'),
... (3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
>>> oml_df5 = oml.create(lst, table ='tbl5',
... dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])
>>>
>>> # Show the SQL data type of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name = 'tbl5'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('CHAR',), ('RAW',)]
>>>
>>> # Close the cursor.
... cr.close()
>>>
>>> # Drop the tables
... oml.drop('tbl1')
>>> oml.drop('tbl2')
>>> oml.drop('tbl3')
>>> oml.drop('tbl4')
>>> oml.drop('tbl5')
Parent topic: Move Data Between the Database and a Python Session