Dashboard Indexes
Indexes aggregate Rule Results, though it is also possible to aggregate indexes hierarchically to create an index of indexes. For example, a data quality index could be constructed for each of a number of source systems, or each of a number of types of data (customer, product etc.). An overall data quality index could then be constructed as an aggregation of these indexes.
Indexes are always configured in Dashboard Administration.
Index Calculation
The index value means little in isolation. However, as the score is calculated from the results of a number of executions of a process or processes (over time), trend analysis will allow the business user to monitor whether the index has gone up or down. This is analogous to monitoring the FTSE100 index.
A higher index value represents a higher data quality score. By default, a 'perfect' DQ index score is 1000.
Index of Rule Results
Where an index is made up of a number of Rule Results, it is calculated as a weighted average across the contributing results.
For example, a Customer Data DQ index may be made up of the following Rule Results and Weightings:
Table 1-8 Rule Results and Weightings
Contributing Rule | Weighting |
---|---|
Validate email address |
12.5% |
Validate address |
25% |
Title/gender mismatches |
37.5% |
Validate name |
25% |
In this configuration, the Validate address and Validate name rules have the default weighting of 25% (a quarter of the overall weight across four rules), but the administrator has specified different weightings for the other rules – the Validate email address rule is interpreted as less important, and the Title/Gender mismatch as more important.
The actual index score is then calculated as a weighted average across internally calculated index scores for each contributing rule.
For each rule, an index score out of 1000 (or the configured base perfect score) is calculated as follows, where 10 points are awarded for a pass, 5 points for a warning, and no points are awarded for an alert:
(((# of passes * 10) + (# of warnings * 5)) / (# of checks *10)) * 1000
For example, if the results of the contributing rules are as follows:
Table 1-9 Results of Contributing Rules
Rule | Checks | Passes | Warnings | Alerts |
---|---|---|---|---|
Validate email address |
1000 |
800 (80%) |
100 (10.0%) |
100 (10.0%) |
Validate address |
1000 |
800 (80%) |
0 (0%) |
200 (20.0%) |
Title/gender mismatches |
1000 |
800 (80%) |
0 (0%) |
200 (20.0%) |
Validate name |
1000 |
800 (80%) |
0 (0%) |
200 (20.0%) |
The index scores of each contributing rule will be as shown below:
Table 1-10 Index Scores
Rule | Index Score Calculation | Index Score |
---|---|---|
Validate email address |
800 passes * 10 points = 8000 + 100 warnings * 5 points = 500 Total = 8500 1000 checks * 10 = 10000 8500/10000 = 0.85 * 1000 = 850 |
850 |
Validate address |
800 passes * 10 points = 8000+ 0 warnings * 5 points = 0 Total = 8000 1000 checks * 10 = 10000 8000/10000 = 0.8 * 1000 = 800 |
800 |
Title/gender mismatches |
800 passes * 10 points = 8000+ 0 warnings * 5 points = 0 Total = 8000 1000 checks * 10 = 10000 8000/10000 = 0.8 * 1000 = 800 |
800 |
Validate name |
800 passes * 10 points = 8000+ 0 warnings * 5 points = 0 Total = 8000 1000 checks * 10 = 10000 8000/10000 = 0.8 * 1000 = 800 |
800 |
The overall index score is then calculated using the weightings, as follows:
Validate email address score (850) * Validate email address weight (0.125) = 106.25 + Validate address score (800) * Validate address weight (0.25) = 200 + Title/gender mismatch score (800) * Title/gender mismatch weight (0.375) = 300 + Validate name score (800) * Validate name weight (0.25) = 200
The total Customer Data DQ index score is 806.25, and is rounded up to 806.3 for display purposes.
Index of Indexes
If an index is created to aggregate other indexes, the index is calculated simply as a weighted average of the contributing indexes. For example, the user might set up an index across a number of other indexes as follows:
Table 1-11 Contributing Indexes
Contributing Index | Weighting |
---|---|
Customer data index |
50% |
Contact data index |
25% |
Order data index |
25% |
If the index values of each indexes are as follows:
Table 1-12 Weighted Average of Contributing Indexes
Contributing Index | Index Score |
---|---|
Customer data index |
825.0 |
Contact data index |
756.8 |
Order data index |
928.2 |
The index would be calculated as follows:
Customer data index (825) * Customer data index weight (0.50) = 412.5 + Contact data index (756.8) * Contact data index weight (0.25) = 189.2 + Order data index (928.2) * Order data index weight (0.25) = 232.5
The overall data quality index would have a value of 834.2.
Indexes of Staggered Audit Results
Indexes may aggregate results from a number of processes. Normally, it is expected that this form of aggregation will be used when the processes are executed at the same intervals. However, this cannot be guaranteed. In some cases, the processes contributing to an index will be out of step. For example, two data quality audit processes are executed. An index is configured to aggregate rule results from both processes, and results for the index history are published as follows:
Table 1-13 Results for Index History
Date | Results from Customer audit process run on | Results from Contact audit process run on |
---|---|---|
12/06/05 |
12/06/05 |
12/06/05 |
13/06/05 |
13/06/05 |
12/06/05 |
14/06/05 |
13/06/05 |
14/06/05 |
15/06/05 |
15/06/05 |
14/06/05 |
16/06/05 |
16/06/05 |
16/06/05 |
This works by recalculating the results for the index every time one of its contributing processes is run. The results from the last run of each process are then used, and any previously calculated index results for a distinct date (day) are overwritten.