Dimensions of Acceptable Data
Organizing the rules of data quality into dimensions not only improves the specification and measurement of the data quality, it also provides the framework under which quality can be measured and reported.
Organizing the rules of data quality into dimensions not only improves the specification and measurement of the data quality, it also provides the framework under which quality can be measured and reported.
Ask questions about this lesson and get instant answers.
Organizing the rules of data quality into dimensions not only improves the specification and measurement of the data quality, it also provides the framework under which quality can be measured and reported. This in turn enables better governance of data quality. Tools can then be built around these dimensions to determine the minimum levels required to meet business expectations and also to monitor the levels in relation to data quality. This can also help in root cause analysis and eventual mitigation of such issues.
The dimensions are often defined in line with contexts in which the metrics associated with the business processes will be measured. These dimensions require continuous management review and oversight. However, it should be noted that in many cases these dimensions are the ones that lend themselves handy to system automation and are the best ones for defining rules for data quality monitoring.
The dimensions and their descriptions are listed below:
Consistency – The term does not necessarily imply correctness. It means that two values from different data sets must not be in conflict with each other. It can come with constraints, which is a set of rules that define relationships between values of attributes, either at a record or a message or along all values of the attribute. There are many contexts in which consistency can be defined:
Completeness – This means that certain attributes must be assigned values in a data set. This can be assigned in three levels of constraints – mandatory attributes that require a value, optional which may have a value under certain set of conditions and inapplicable attributes which may not have a value. This may also be seen as encompassing the usability and appropriateness of data values.
Timeliness – This can be measured from the time the information is expected and the time it is ready for use. It refers to the time expectation of accessibility and availability of information. In the real-world service levels with respect to the availability of information are defined which indicate how quickly the data must be provided.
Currency – This measure is how latest the information is in the world that is being modeled. In other words it indicates how up-to-date the information actually is despite time related changes. Data currency apart from verifying that the data is up-to-date also indicates the expected frequency at which these are expected to be refreshed. These may require some manual as well as automated processes.
Conformance – This refers to if the instances of data are stored, exchanged or presented in a format that is consistent with the domain of values i.e. if it follows the rules of meta-data that are assigned to it.
Referential Integrity – Unique identifiers are assigned to objects which simplifies the management of data. This however introduces new expectations that any time an object identifier is used as a foreign key within a data set that refer to some core representation, that core representation actually exists.