Creating Data Quality Scorecard: Motivation and Mechanics
The data quality scorecard is meant to present a picture of the data quality levels and as to where they are impacting the business and where the impact is not that high or important. The rules are meant to provide a framework for measuring how close the data is to business expectations.
The idea behind validating data to the defined data quality rules is to deduce the levels of conformance. The nature of rules being validated defines what is being measured. Most the major vendors allow for this so that data can be audited and monitored for validity.
Thresholds of Conformance
Acceptability thresholds are set to ensure the measurement of different levels of expected conformance as different data flaws have varied business impacts and different levels of business criticality are revealed by this. The simplest method is to have a single threshold which when breached will indicate unacceptable data quality and when not breached will represent acceptable data. In a two step process the data quality thresholds are set as acceptable and questionable but usable.
Ongoing Monitoring and Process Control
Tracking the overall data quality over time gives insight into how much the system has improved the data quality over a period of time. In other words it gives a historical perspective on this. The statistical control process shows if the data quality is within acceptable range when compared to historical control bounds and when it can help in notifying the data stewards when an exception event is happening and where to look to track down the process that is causing this. Historical charts are an important part of the scorecard.
This content is for paid members only.
Join our membership for lifelong unlimited access to all our data science learning content and resources.