3.1.3 Build Model

This model is designed to classify data into predefined categories by learning from training data.

Algorithm Selection

You can choose one of the following in-database algorithms to solve a classification problem:

  • Decision Tree
  • Generalized Linear Model
  • Naive Bayes
  • Neural Network
  • Random Forest
  • Support Vector Machine

Here you will be using the Support Vector Machine algorithms because the SVM classification is one of the algorithms that supports binary classification.

  1. Split the data into train and test data sets. The train set is used to train the model so that it learns the hidden patterns and the test set is used to evaluate the trained model. Split the DEMO_DF data with 60 percent of the records for the train data set and 40 percent for the test data set.
    sampleSize <- .4 * nrow(DEMO_DF)
    index <- sample(1:nrow(DEMO_DF),sampleSize)
    group <- as.integer(1:nrow(DEMO_DF) %in% index)
     
    rownames(DEMO_DF) <- DEMO_DF$CUST_ID
    DEMO_DF.train <- DEMO_DF[group==FALSE,]
    class(DEMO_DF.train)
     
    DEMO_DF.test <- DEMO_DF[group==TRUE,]
    class(DEMO_DF.test)
     
    'ore.frame'
    'ore.frame'
  2. After splitting the data, let's see the count of rows in train and test to see if any rows are left out in either of the datasets.
    cat("\nTraining data: ")
    dim(DEMO_DF.train)
    cat("\nTest data: ")
    dim(DEMO_DF.test)
     
     
      Training data:  2700 13
      Test data: 1800 13
  3. Build your model using the ore.odmSVM function, which creates a Support Vector Machine model using the training data. The ore.odmSVM function is the R interface to the in-database SVM algorithm. Then we will make the prediction using this model for our test data.
    ore.exec(
      "BEGIN DBMS_DATA_MINING.DROP_MODEL(model_name => 'SVM_CLASSIFICATION_MODEL');
       EXCEPTION WHEN OTHERS THEN NULL; END;"
    )
     
    MOD <- ore.odmSVM(
      formula = AFFINITY_CARD ~ .,
      data = DEMO_DF.train,
      type = "classification",
      kernel.function = "system.determined",
      odm.settings = list(model_name = "SVM_CLASSIFICATION_MODEL")
    )
     
    RES <- predict(
      object = MOD,
      data = DEMO_DF.test,
      type = c("raw", "class"),
      norm.votes = TRUE,
      cache.model = TRUE,
      supplemental.cols = c(
        "CUST_ID", "AFFINITY_CARD", "EDUCATION",
        "HOUSEHOLD_SIZE", "OCCUPATION", "YRS_RESIDENCE"
      )
    )