4.5.5 Visualize Data in a Scatter Plot

Scatter plots represent the relationship between two numeric variables in a data set. It represents data points on a two-dimensional plane and show how much one variable is affected by another. The independent variable is plotted on the X-axis, while the dependent variable is plotted on the Y-axis. You can display points by one or more grouping variables such that each group has a distinct color and shape.

When to use this chart: Use the scatter plot when you have paired numerical data, and you want to determine the relationship between the related variables in certain scenarios, identifying correlations and trends (linear and non-linear relationships), detecting outliers, understanding data distribution, identifying groupings or clusters of data. Scatter plots can also be useful when comparing multiple datasets where each datasets values are represented as a different group. Scatter plots are also useful for evaluating regression models by plotting, e.g., actual versus predicted values.
Dataset: CUSTOMER_INSURANCE_LTV. In this example, we will use the example template notebook OML-Run-me-first.
To visualize data in a scatter plot:
  1. In the OML-Run-Me-First notebook, go to the paragraph where you viewed the CUSTOMER_INSURANCE_LTV. Click on the Scatter plot icon. A default scatter plot is shown that you will customize in the next step.

    Figure 4-27 Toolbar highlighting the Scatter Plot icon


    Toolbar highlighting the Scatter Plot icon

  2. Click the Settings icon. In the Settings dialog, under Setup:
    • Series to show on X-axis: Click and select INCOME.
    • Series to show on Y-axis: Click and select MORTGAGE_AMOUNT.
    • Group By: Select MARITAL_STATUS.
  3. Click Customization:
    • Visualization: Retain the default settings.
    • Description: Under Title, enter Scatter plot to show the correlation between income and mortgage amount.

    Figure 4-28 Scatter Plot


    Scatter Plot

This completes the task of visualizing your data in a scatter plot. The scatter plot shows a strong correlation between Income and Mortgage amount in the income range 50k to 80k.