- Using Oracle Spatial AI on Autonomous Database Serverless
- Review Use Cases for Using Spatial AI
- Spatial Regression Use Case Scenario
- Load the Data
Load the Data
Perform the following steps to load the data:
- Create an instance of
SpatialDataFrame
.The census dataset is stored in thela_block_groups
table in the database. To load it into Python, use aDBSpatialDataset
and create an instance ofSpatialDataFrame
.import oml from oraclesai import SpatialDataFrame, DBSpatialDataset block_groups = SpatialDataFrame.create(DBSpatialDataset(table='la_block_groups', schema='oml_user'))
The dataset contains information about different regions in the city of Los Angeles, and features such as
median_income
andhouse_value
provide information about each region's income. Other features provide demographic information about gender, race, and age. - Review the variables (shown in the following table) of the
SpatialDataFrame
instance and define the columns that represent the target variable, the explanatory variables, and the geometries.Variable Description MEDIAN_INCOME
The target variable representing the median income. MEAN_AGE
The average age. MEAN_EDUCATION_LEVEL
Score based on the different education levels listed in the Census table. HOUSE_VALUE
Median value of houses in the region. PER_WHITE
Proportion of the white population in the region. PER_BLACK
Proportion of the black population in the region. The following code selects a subset of columns from the
SpatialDataFrame
instance.X = block_groups[['MEDIAN_INCOME', 'MEAN_AGE', 'MEAN_EDUCATION_LEVEL', 'HOUSE_VALUE', 'INTERNET', 'geometry']]
- Define the training, validation, and test sets.
- Split the data into training and test sets using the
spatial_train_test_split
function fromoreaclesai.preprocessing
. Assign 20% of the data for testing.from oraclesai.preprocessing import spatial_train_test_split X_train_valid, X_test, _, _, _, _ = spatial_train_test_split(X, y="MEDIAN_INCOME", test_size=0.2, random_state=32)
- Split the remaining 80% of the data again to create the training and
validation sets, using 10% for validation and the rest for training. The
validation set is helpful to evaluate the model’s performance before
using it with the test set.
X_train, X_valid, _, _, _, _ = spatial_train_test_split(X_train_valid, y="MEDIAN_INCOME", test_size=0.1, random_state=32)
- Split the data into training and test sets using the