Key Dimensions that Characterize Acceptable Data

Organizing the rules of data quality into dimensions not only improves the specification and measurement of the data quality, it also provides the framework under which quality can be measured and reported. This in turn enables better governance of data quality. Tools can then be built around this to determine the minimum levels required to meet Business Expectations and also to monitor the levels in relation to data quality. This can also help in root cause analysis and eventual mitigation of such issues.

The dimensions are often defined in line with contexts in which the metrics associated with the business processes will be measured. These dimensions require continuous management review and oversight. However, it should be noted that in many cases these dimensions are the ones that lend themselves handy to system automation and are the best ones for defining rules for data quality monitoring.

The dimensions and their descriptions are listed below:

Uniqueness – When uniqueness of entities is asserted it means that there is no repetition of entities and there is also a unique key that defines the entity in the system. Uniqueness means that the requirements of the entities are captured and represented uniquely within the relevant application architectures. It is also not correct to create data instances where there is an existing record of that entity. Apart from running duplicate analysis techniques this also implies creating an identity matching resolution service at the time of record creation.
Accuracy – This refers to the extent to which the data correctly represents the real-life objects they are intended to model. An example of a real-life object is reference data. Among the different sources of correct information we can also find a database of record, a similar corroborative set of data from another table, dynamically computed values or perhaps the result of a manual process.
Consistency – The term does not necessarily imply correctness. It means that two values from different data sets must not be in conflict with each other. It can come with constraints, which is a set of rules that define relationships between values of attributes, either at a record or a message or along all values of the attribute. There are many contexts in which consistency can be defined:
- Record level i.e. within the same record
- Between different records i.e. cross-record consistency
- Temporal i.e. across different points in time
- It must also take into account reasonableness
Completeness – This means that certain attributes must be assigned values in a data set. This can be assigned in three levels of constraints – mandatory attributes that require a value, optional which may have a value under certain set of conditions and inapplicable attributes which may not have a value. This may also be seen as encompassing the usability and appropriateness of data values.