6.2 View Data Monitor Results

The Data Monitor Results page displays the information on the selected data monitor that have run successfully, along with data drift details for each monitored feature.

On the Data Monitors page, click on a data monitor that has run successfully. In this example, the data monitor Power Consumption is selected. The results of the data monitor is displayed on the Data Monitor Results page which comprises these sections:
  • Settings — The Settings section displays the data monitor settings. Click on the arrow against Settings to expand this section. You have the option to edit the data monitor settings by clicking Edit on the top right corner of the page. In this screenshot, the settings for the data monitor Power Consumption is seen.

    Figure 6-10 Settings section on the Data Monitor Results page

    Settings section on the Data Monitor Results page
  • Drift — The Drift section displays the details of data drift for each monitored feature. In this example, the data monitor Power consumption data monitor is selected. The X axis depicts the analysis period, and the Y axis depicts the data drift values. The horizontal dotted line is the threshold value, and the line depicts the drift value for each point in time for the analysis period. Hover your mouse over the line to view the drift values.

    Figure 6-11 Data Drift section on the Data Monitor Results page

    Data Drift section on the Data Monitor Results page
  • Features — The Features section displays the monitored features along with the computed statistics.

    Figure 6-12 Features section on the Data Monitor Results page

    Features section on Data Monitor Results page

    The value in the Importance column indicates how impactful the feature has been on data drift over a specified time period.

    For numerical data, the following statistics are computed:
    • Mean
    • Standard Deviation
    • Range (Minimum, Maximum)
    • Number of nulls
    For categorical data, the following statistics are computed:
    • Number of unique values
    • Number of nulls

    For each monitored feature, hover your mouse to view the following additional details, as shown in the screenshot here.

    • First: This is the first value of the computed statistics for the analysis period.
    • Last: This is the last value of the computed statistics for the analysis period.
    • Max: This is the highest value of the computed statistics for the analysis period.
    • Min: This is the lowest value of the computed statistics for the analysis period.
  • Click on any monitored feature in the Features section to view the Metric, Statistics, Distribution, and Distribution with Crosstab Column, as shown in the screenshot here. In the screenshot here, the Population Stability Index is shown for the feature GLOBAL_REACTIVE_POWER.

    Figure 6-13 Population Stability Index

    Population Stability Index
    The computations include:
    • Metric: The following metrics are computed:
      • Population Stability Index (PSI): This is a measure of how much a population has shifted over time or between two different samples of a population in a single number. The two distributions are binned into buckets, and PSI compares the percents of items in each of the buckets. PSI is computed as

        PSI = sum((Actual_% - Expected_%) x ln (Actual_% / Expected_%))

        The interpretation of PSI value is:
        • PSI < 0.1 implies no significant population change
        • 0.1 <= PSI < 0.2 implies moderate population change
        • PSI >= 0.2 implies significant population change
      • Jenson Shannon Distance (JSD): This is a measure of the similarity between two probability distributions. JSD is the square root of the Jensen-Shannon Divergence which is related to the Kullbach-Leibler Divergence (KLD). JSD is computed as:

        SD(P || Q)= sqrt(0.5 x KLD(P || M) + 0.5 x KLD(Q || M))

        Where, P and Q are the 2 distributions, M = 0.5 x (P + Q), KLD(P || M) = sum(Pi x ln(Pi / Mi)), and KLD(Q || M) = sum(Qi x ln(Qi / Mi))

        The value of JSD ranges between 0 and 1.

      • Crosstab Population Stability Index: This is the PSI for two variables.
      • Crosstab Jenson Shannon Distance: This is the JSD for two variables.
    • Statistics: You can view statistics for up to 3 selected periods. Data drift is quantified using these statistical computations.

      Figure 6-14 Statistics

      Statistics
      For numerical data, the following statistics are computed:
      • Mean
      • Standard Deviation
      • Range (Minimum, Maximum)
      • Number of nulls
      For categorical data, the following statistics are computed:
      • Number of unique values
      • Number of nulls
    • Distribution: The feature distribution chart with legend displays bins of feature for selected periods and the baseline (optional).

      Figure 6-15 Distribution Chart and Distribution with Crosstab column

      Distribution with Crosstab column
    • Distribution with Crosstab Column: The heat map indicates the density of distribution for the selected crosstab and the feature column. Red denotes highest density.

      Note:

      In data drift monitoring, nulls are are tracked separately as number_of_missing_values.