Common Issues Resulting in Data Errors
The data input and editing process is one of the most time consuming and interactive processes that can be prone to many data errors.
The types of errors that can occur at the time of data input are as follows:
- Incomplete data which refers to missing points, line segments and or polygons.
- Careless digitizing or bad quality data at the data source which in other words is called placement errors.
- Distortion and this happens by photographs that are not placed correctly over the whole image or are not scale correct.
- Incorrect linkages whose examples include assigning the wrong label to a feature or more than one label being assigned to a feature. This results in incorrect unique labels being assigned during manual key or in digitizing.
- Missing data records or too many data are often the issue. Attribute data does not match spatial data because they are from independent sources and from different time periods. This means wrong or incomplete data.
As the resolution of these errors is often very time consuming and labor intensive (i.e. costly process), the error correction capabilities are very important in this process.
The attribute data errors are usually difficult to spot and fix as compared to spatial errors especially when they refer to the quality or reliability of the data. Errors like topological errors and dead ends and spatial lines fall into this category. In other words these occur during the process of digitizing which is the most common cause of issues.
In the case of master data the issue is business unit and product line segmentation in which the same customer will be serviced by separate product lines and redundancy is thereby encouraged.
Steps of addressing data error issues
The ways in which the errors can be addressed include visual review, cleanup of lines and junctions, weeding of excess coordinates, correction for distortion or warping, construction of polygons and the addition of unique identifiers or labels.
These steps occur during the data verification stage which is after the data input stage and prior to the linkage of data. This step ensures integrity between spatial and attribute data.
The other way of addressing this is to use a master data tool to remove duplicates, mass maintaining data, incorporating rules to eliminate incorrect data so that a proper source of data is established.
Data Science in Finance: 9-Book Bundle
Master R and Python for financial data science with our comprehensive bundle of 9 ebooks.
What's Included:
- Getting Started with R
- R Programming for Data Science
- Data Visualization with R
- Financial Time Series Analysis with R
- Quantitative Trading Strategies with R
- Derivatives with R
- Credit Risk Modelling With R
- Python for Data Science
- Machine Learning in Finance using Python
Each book includes PDFs, explanations, instructions, data files, and R code for all examples.
Get the Bundle for $39 (Regular $57)Free Guides - Getting Started with R and Python
Enter your name and email address below and we will email you the guides for R programming and Python.