Store Data into Database Tables or Files
Working on a machine learning task may involve data transformation
activities (such as creating meaningful features, data cleaning, encoding categorical
variables, and so on) to make the data more useful. You can then store these data changes in
the database or files using the write
method of the
SpatialDataFrame
class.
All transformations of a SpatialDataFrame
do not have
any direct effect in the database tables prior to calling the write
method.
The following table describes the main parameters of the
write
function.
Parameter | Description |
---|---|
dataset |
This parameter is an instance of
SpatialDataset , and represents the resulting
dataset.
|
if_exists |
The options are fail and
replace . Determines the action to take if the
resulting dataset already exists.
|
include_index |
If True , the index columns of the
instance are written in the result.
|
create_spatial_metadata |
If the spatial metadata needs to be created, then this
value is set to True . Used only for Oracle Spatial
database datasets.
|
create_spatial_index |
This parameter is used only for Oracle Spatial database
datasets, and it is True if a spatial index needs to be
created.
|
See the SpatialDataFrame.write function in Python API Reference for Oracle Spatial AI for more information.
The following example uses the block_groups
SpatialDataFrame
and performs the following steps:
- Adds a new column with a categorical variable called
INCOME_LABEL
. - Calls the
add_column
method which returns a new instance ofSpatialDataFrame
with the extended dataset. - Calls the
write
method to store the data in the database. - Loads the data from the recently created table in a
SpatialDataFrame
and verifies the newly added column.
# The column INCOME_LABEL is not in the dataset
if "INCOME_LABEL" not in block_groups.columns:
print("The column INCOME_LABEL is not part of the columns of block_groups")
# Create the variable "INCOME_LABEL" based on the median income
block_groups_extended = block_groups.add_column("INCOME_LABEL", pd.qcut(block_groups['MEDIAN_INCOME'].values, [0, 0.5, 1], labels=[0, 1]).to_list())
# Store the extended data in the database
block_groups_extended.write(DBSpatialDataset(table='write_test'),
if_exists='replace',
create_spatial_index=True)
# Load the stored dataset in a new SpatialDataFrame
block_groups_new = SpatialDataFrame.create(DBSpatialDataset(table='write_test'))
# The column INCOME_LABEL is contained in the dataset's columns
if "INCOME_LABEL" in block_groups_new.columns:
print("The column INCOME_LABEL is contained in the columns of block_groups_new")
The output confirms that the new column, INCOME_LABEL
, is part of the
dataset stored in the database.
The column INCOME_LABEL is not part of the columns of block_groups
The column INCOME_LABEL is contained in the columns of block_groups_new