3.2 Clustering Use Case
A retail store has information about its customers' behavior and the purchases they make. With that data, they would like you to analyze and identify if there are groups of customers with similar characteristics. Use Oracle Machine Learning to segment customers by finding clusters in the data set that can be then used to support targeted marketing campaigns to increase retail sales. In this use case, you will learn how to identify such segments using the k-Means algorithm.
Data Understanding
To understand the data, perform the following tasks:
- Access data
- Explore data
- Load Data
Access the data set from the SH Schema and explore the data to understand the attributes. - Explore Data
Once the data is accessible, explore the data to understand and assess the quality of the data. At this stage assess the data to identify data types and noise in the data. Look for missing values and numeric outlier values. - Build Model
To evaluate a model's performance, it is common practice to split the data into training and test sets. This allows you to assess how well the model generalizes to unseen data. However, in unsupervised learning, such as clustering, there are no labels or predictors available to calculate accuracy or evaluate performance. As a result, you can use the entire dataset to build the model without the need to split it. Since there is no ground truth to compare the results against, the training-test split is neither applicable nor useful in unsupervised learning. - Deploy the Model
Here are several approaches to deploy your OML4R model and leverage its insights:
Parent topic: Use Cases