3.2.1 Load Data
Access the data set from the SH Schema and explore the data to understand the attributes.
Access Data
You will be using the CUSTOMERS
and SUPPLEMENTARY_DEMOGRAPHICS
tables available in the SH schema.
See SH.CUSTOMERS for information about the CUSTOMERS
table in SH
Schema.
The following table displays information about the attributes from SUPPLEMENTARY_DEMOGRAPHICS
:
Attribute Name | Data Type | Information |
---|---|---|
CUST_ID | Numeric | The ID of the customer |
EDUCATION | Character | Education level attained |
OCCUPATION | Character | Occupation of the customer |
HOUSEHOLD_SIZE | Character | Number of people living at residence |
YRS_RESIDENCE | Numeric | Number of years customer lived at current residence |
AFFINITY_CARD | Character | Indicates whether the customer holds an affinity card.
1 means Yes. 0 means No. |
BULK_PACK_DISKETTES | Character | Product. Indicates whether the customer purchased the bulk pack diskettes.
1 means Yes. 0 means No. |
FLAT_PANEL_MONITOR | Character | Product. Indicates whether the customer purchased flat panel monitor.
1 means Yes. 0 means No |
HOME_THEATER_PACKAGE | Character | Product. Indicates whether the customer purchased home theatre package.
1 means Yes. 0 means No |
BOOKKEEPING_APPLICATION | Character | Product. Indicates whether the customer purchased bookkeeping application.
1 means Yes. 0 means No |
PRINTER_SUPPLIES | Character | Product. Indicates whether the customer purchased printer supplies.
1 means Yes. 0 means No |
Y_BOX_GAMES | Character | Product. Indicates whether the customer purchased YBox Games.
1 means Yes. 0 means No |
OS_DOC_SET_KANJI | Character | Product. Indicates whether the customer purchased the Kanji character set for the operating system documentation.
1 means Yes. 0 means No |
COMMENTS | Character | Comments from customers |
To access database data from R using OML4R, you must first create a proxy object in R that represents a database table, view, or query. In this example, the proxy object is created using a query. Create proxy objects for SUPPLEMENTARY_DEMOGRAPHICS and CUSTOMERS and then merge them by inner join on a key column, in this case, CUST_ID. Assess the data to identify data types and data quality issues. Look for missing values, outlier numeric values, or inconsistently labeled categorical values.
- Run the following command in an R interpreter paragraph (using %r) in an OML notebook (or similar notebook environment) to import the Oracle Machine Learning for R libraries and suppress warnings regarding row ordering. Alternatively, this code can be run from the R command line or tools like RStudio.
library(ORE) options(ore.warn.order=FALSE)
- Use the
ore.sync
function to create theore.frame
object that is a proxy for theCUSTOMERS
table in theSH
schema database table. - Use the
ore.sync
function to create theore.frame
object that is a proxy for theSUPPLEMENTARY DEMOGRAPHICS
table in theSH
schema database table.ore.sync(query = c("SUPPLEMENTARY_DEMOGRAPHICS" = "select CUST_ID, HOUSEHOLD_SIZE, YRS_RESIDENCE, TO_CHAR(Y_BOX_GAMES) Y_BOX_GAMES from SH.SUPPLEMENTARY_DEMOGRAPHICS")) # The TO_CHAR function is used to have Y_BOX_GAMES treated as a categorical variable, not a numeric variable. z.show(head(SUPPLEMENTARY_DEMOGRAPHICS))
Parent topic: Clustering Use Case