3.1.3 Build Model
This model is designed to classify data into predefined categories by learning from training data.
Algorithm Selection
You can choose one of the following in-database algorithms to solve a classification problem:
- Decision Tree
- Generalized Linear Model
- Naive Bayes
- Neural Network
- Random Forest
- Support Vector Machine
Here you will be using the Support Vector Machine algorithms because the SVM classification is one of the algorithms that supports binary classification.
- Split the data into train and test data sets. The train set is used to train the model so that it learns the hidden patterns and the test set is used to evaluate the trained model. Split the
DEMO_DF
data with 60 percent of the records for the train data set and 40 percent for the test data set.sampleSize <- .4 * nrow(DEMO_DF) index <- sample(1:nrow(DEMO_DF),sampleSize) group <- as.integer(1:nrow(DEMO_DF) %in% index) rownames(DEMO_DF) <- DEMO_DF$CUST_ID DEMO_DF.train <- DEMO_DF[group==FALSE,] class(DEMO_DF.train) DEMO_DF.test <- DEMO_DF[group==TRUE,] class(DEMO_DF.test) 'ore.frame' 'ore.frame'
- After splitting the data, let's see the count of rows in train and test to see if any rows are left out in either of the datasets.
cat("\nTraining data: ") dim(DEMO_DF.train) cat("\nTest data: ") dim(DEMO_DF.test) Training data: 2700 13 Test data: 1800 13
- Build your model using the
ore.odmSVM
function, which creates a Support Vector Machine model using the training data. Theore.odmSVM
function is the R interface to the in-database SVM algorithm. Then we will make the prediction using this model for our test data.ore.exec( "BEGIN DBMS_DATA_MINING.DROP_MODEL(model_name => 'SVM_CLASSIFICATION_MODEL'); EXCEPTION WHEN OTHERS THEN NULL; END;" ) MOD <- ore.odmSVM( formula = AFFINITY_CARD ~ ., data = DEMO_DF.train, type = "classification", kernel.function = "system.determined", odm.settings = list(model_name = "SVM_CLASSIFICATION_MODEL") ) RES <- predict( object = MOD, data = DEMO_DF.test, type = c("raw", "class"), norm.votes = TRUE, cache.model = TRUE, supplemental.cols = c( "CUST_ID", "AFFINITY_CARD", "EDUCATION", "HOUSEHOLD_SIZE", "OCCUPATION", "YRS_RESIDENCE" ) )
Parent topic: Classification Use Case