Data Quality Scorecard

The classification of the different areas of impact of poor data quality resulted in four different impact areas. Within each impact area several different parameters of impact can then be determined to generate what is called a data quality scorecard which contains complex data metrics. These can then be analyzed using three different view points.

The sub-categories for each different category of data quality flaws are described below:

1)      Financial

  • Direct Operating Expenses – direct costs such a labor, raw material that can be used for fulfilling obligations related to contract
  • General overhead – rent, maintainance, asset purchase, utility, licensing, administrative staff and general procurement
  • Staff overhead – those required to run a business like clerical, sales, management, field supervision, bids and proposals, recruiting and training.
  • Fees and charges – bank fees, service charges, legal, accounting, penalties and fines, bad debt, merger and acquisition costs
  • Cost of goods sold – design, raw materials, production, cost of inventory, inventory planning, marketing, sales, customer management, advertising, lead generation, promotional events, samples, order replacement, order fulfillment and shipping
  • Revenue – customer acquisition, customer retention, churn, missed opportunities
  • Cashflow – delayed and missed customer invoicing, ignored overdue customer payments, quick supplier payments, increased interest rates, EBITDA
  • Depreciation – property market value, inventory markdown
  • Capitalization – value of equity
  • Leakage – collections, fraud, commissions, inter-organizational settlements

2)      Confidence and Satisfaction

  • Forecasting - staffing, financial, material requirements, spend vs budget
  • Reporting – timeliness, currency, availability, accuracy,  reconciliation needs
  • Decision making – time to decision, predictability
  • Customer satisfaction – cost of selling, retention, buys per customer, items per customer, sales cost, service cost, time to respond, referrals, new product suggestions
  • Supplier management – optimized purchasing, reduced pricing, making acquisitions simple
  • Employee Satisfaction – recruitment costs, hiring, retention, turnover, compensation

3)      Productivity

  • Workloads – reconciliation of reports
  • Throughput – Increased time for data gathering and preparation, reduced time for direct data analysis, delays in delivering information products, lengthened production and manufacturing cycles
  • Output quality – reports not trusted
  • Supply chain – not in stock, delays in delivery, missed deliveries, replicated costs

4)      Risk and Compliance

  • Regulatory – reporting, protection of private information
  • Industry – processing standards, exchange standards, operational standards
  • Safety – health and occupational hazards
  • Market – competitiveness, goodwill, commodity risk, currency risk, equity risk
  • Financial – loan default risk, investment depreciation, noncompliance penalties
  • System – delays in development, delays in and deployment
  • Credit/Underwriting – credit risk, default, capacity, sufficiency of capitalization
  • Legal – legal research, preparation of material

Viewpoints for Reporting

There are three different viewpoints by which the data in the complex data metrics can be reported i.e. validating data, thresholds of conformance as well as ongoing monitoring and process control.

While validating data it is important to note that the rules are to be created for each data set based on the range of values that they can assume. This is commonly achieved by profiling of data, parsing, standardization and cleansing tools. The best of the vendors provide for this mechanism of integrating data quality within their tools. By tracking the number of times the levels are violated one can determine the percentage of violations and conformance to defined rules.

The thresholds determine if the number of violations conform to user expectations. Typically different colors are assigned to different thresholds and the data is displayed according to that format. This is because different data flaws have different impacts and represent different levels of business criticality. The levels of conformance might be different and the thresholds vary according to that.

One also needs to monitor and control the data quality as system modifications and updates have impacted data quality in different ways. The historical view of the data quality showed how the data has evolved over time given different levels of conformance.

Membership
Learn the skills required to excel in data science and data analytics covering R, Python, machine learning, and AI.
I WANT TO JOIN
JOIN 30,000 DATA PROFESSIONALS

Free Guides - Getting Started with R and Python

Enter your name and email address below and we will email you the guides for R programming and Python.

Saylient AI Logo

Take the Next Step in Your Data Career

Join our membership for lifetime unlimited access to all our data analytics and data science learning content and resources.