Store Data into Database Tables or Files

Working on a machine learning task may involve data transformation activities (such as creating meaningful features, data cleaning, encoding categorical variables, and so on) to make the data more useful. You can then store these data changes in the database or files using the write method of the SpatialDataFrame class.

All transformations of a SpatialDataFrame do not have any direct effect in the database tables prior to calling the write method.

The following table describes the main parameters of the write function.

Parameter Description
dataset This parameter is an instance of SpatialDataset, and represents the resulting dataset.
if_exists The options are fail and replace. Determines the action to take if the resulting dataset already exists.
include_index If True, the index columns of the instance are written in the result.
create_spatial_metadata If the spatial metadata needs to be created, then this value is set to True . Used only for Oracle Spatial database datasets.
create_spatial_index This parameter is used only for Oracle Spatial database datasets, and it is True if a spatial index needs to be created.

See the SpatialDataFrame.write function in Python API Reference for Oracle Spatial AI for more information.

The following example uses the block_groups SpatialDataFrame and performs the following steps:

  1. Adds a new column with a categorical variable called INCOME_LABEL.
  2. Calls the add_column method which returns a new instance of SpatialDataFrame with the extended dataset.
  3. Calls the write method to store the data in the database.
  4. Loads the data from the recently created table in a SpatialDataFrame and verifies the newly added column.
# The column INCOME_LABEL is not in the dataset
if "INCOME_LABEL" not in block_groups.columns:
    print("The column INCOME_LABEL is not part of the columns of block_groups")
 
# Create the variable "INCOME_LABEL" based on the median income
block_groups_extended = block_groups.add_column("INCOME_LABEL", pd.qcut(block_groups['MEDIAN_INCOME'].values, [0, 0.5, 1], labels=[0, 1]).to_list())
 
 
# Store the extended data in the database
block_groups_extended.write(DBSpatialDataset(table='write_test'), 
                            if_exists='replace', 
                            create_spatial_index=True)
 
# Load the stored dataset in a new SpatialDataFrame
block_groups_new = SpatialDataFrame.create(DBSpatialDataset(table='write_test'))
 
# The column INCOME_LABEL is contained in the dataset's columns
if "INCOME_LABEL" in block_groups_new.columns:
    print("The column INCOME_LABEL is contained in the columns of block_groups_new")

The output confirms that the new column, INCOME_LABEL, is part of the dataset stored in the database.

The column INCOME_LABEL is not part of the columns of block_groups
The column INCOME_LABEL is contained in the columns of block_groups_new