Data Quality Inspection and Validation
Data quality inspection is another word for assessment and is a means by which one can establish good and bad data. This is an objective review of data by and analysis of what data populates which data sets through quantitative analyst reviews. The whole array of discrepancies might not be covered but the instances recorded should provide leads to the data quality issues or problem. Data profiling is another way of achieving this.
Data validation means that we must be true about all data and also the fact that it falls in line with expectations. Both transformation and profiling rules will provide the mechanism by which validation can be done and can be tested against large instances of data. For example if through profiling it is determined that the values should be between 30 and 75 then a validation rule can be determined that the lowest value should be 30 and the highest is 75. When data flows through the system the rule can be used to verify that each of the values fall within the given range.
This content is for paid members only.
Join our membership for lifelong unlimited access to all our data science learning content and resources.