3.2.1 Load Data

Access the data set from the SH Schema and explore the data to understand the attributes.

Access Data

You will be using the CUSTOMERS and SUPPLEMENTARY_DEMOGRAPHICS tables available in the SH schema.

See SH.CUSTOMERS for information about the CUSTOMERS table in SH Schema.

The following table displays information about the attributes from SUPPLEMENTARY_DEMOGRAPHICS:

Attribute Name Data Type Information
CUST_ID Numeric The ID of the customer
EDUCATION Character Education level attained
OCCUPATION Character Occupation of the customer
HOUSEHOLD_SIZE Character Number of people living at residence
YRS_RESIDENCE Numeric Number of years customer lived at current residence
AFFINITY_CARD Character Indicates whether the customer holds an affinity card.

1 means Yes. 0 means No.

BULK_PACK_DISKETTES Character Product. Indicates whether the customer purchased the bulk pack diskettes.

1 means Yes. 0 means No.

FLAT_PANEL_MONITOR Character Product. Indicates whether the customer purchased flat panel monitor.

1 means Yes. 0 means No

HOME_THEATER_PACKAGE Character Product. Indicates whether the customer purchased home theatre package.

1 means Yes. 0 means No

BOOKKEEPING_APPLICATION Character Product. Indicates whether the customer purchased bookkeeping application.

1 means Yes. 0 means No

PRINTER_SUPPLIES Character Product. Indicates whether the customer purchased printer supplies.

1 means Yes. 0 means No

Y_BOX_GAMES Character Product. Indicates whether the customer purchased YBox Games.

1 means Yes. 0 means No

OS_DOC_SET_KANJI Character Product. Indicates whether the customer purchased the Kanji character set for the operating system documentation.

1 means Yes. 0 means No

COMMENTS Character Comments from customers

To access database data from R using OML4R, you must first create a proxy object in R that represents a database table, view, or query. In this example, the proxy object is created using a query. Create proxy objects for SUPPLEMENTARY_DEMOGRAPHICS and CUSTOMERS and then merge them by inner join on a key column, in this case, CUST_ID. Assess the data to identify data types and data quality issues. Look for missing values, outlier numeric values, or inconsistently labeled categorical values.

  1. Run the following command in an R interpreter paragraph (using %r) in an OML notebook (or similar notebook environment) to import the Oracle Machine Learning for R libraries and suppress warnings regarding row ordering. Alternatively, this code can be run from the R command line or tools like RStudio.
    library(ORE)
    options(ore.warn.order=FALSE)
  2. Use the ore.sync function to create the ore.frame object that is a proxy for the CUSTOMERS table in the SH schema database table.
    This image shows first few rows of Customers data set.

  3. Use the ore.sync function to create the ore.frame object that is a proxy for the SUPPLEMENTARY DEMOGRAPHICS table in the SH schema database table.
    ore.sync(query = c("SUPPLEMENTARY_DEMOGRAPHICS" = "select CUST_ID, HOUSEHOLD_SIZE, YRS_RESIDENCE, TO_CHAR(Y_BOX_GAMES) Y_BOX_GAMES from SH.SUPPLEMENTARY_DEMOGRAPHICS"))
    # The TO_CHAR function is used to have Y_BOX_GAMES treated as a categorical variable, not a numeric variable.
    z.show(head(SUPPLEMENTARY_DEMOGRAPHICS))

    This images shows the first few rows of Supplementary Demographics Data set.