“Garbage In, Garbage Out”: Importance of Data Quality in Analytics and Its Implications

by Karen Orbista

Posted on 2020-10-30



Data Analytics is the process of analyzing raw data in order to generate statistics, identify patterns, and provide meaningful insights out of it. With the help of technology, it is easier to perform such analysis, especially for big data, and make visualizations for the viewers to see what the data are trying to convey. But how sure are they that those data are in good quality? Why is it important to check for the quality of the data?

 

Ensuring good data quality might be hard but one should take the importance of data quality into account whenever they are going to analyze data. Accessibility, accuracy, completeness, consistency, relevance, and reliability are some factors that could be considered before, during, and after performing the data analysis.

 

 

Accessibility: Data should be available and can be easily obtained. Before making some project proposals, people should be sure where they can get all the data that they will be needing for their project. Failure to do so might waste their effort and time as they have to formulate another project or find another source again. Also, other individuals might need the data that the analyst used so they should put the source or the data itself on a link if it will be published online. However, one should note that some data are confidential.

 

 

Accuracy: Data must be accurate. The data that are gathered must be true and represent what it should. Statistics and trends are the most common reports that can be seen in news, companies, and articles. With that, it would be easy for the viewers and listeners to depict what is the current real-life situation. However, not all of them have a deep understanding on how data should be interpreted appropriately. There could be some misinformation if data are not accurate, in which, it could lead to false knowledge and might bring a new problem.

 

 

Completeness: Data must be complete. It does not necessarily mean that it is literally complete, but the information, especially those that are crucial to the data analysis, are present. Although, there are some ways to fix the missing entries on the data like imputation, it is still better if the data contain the actual entries. In addition, a complete data could contribute to the accuracy of the results.

Consistency: Data must be consistent. Each variables must be on the same format depending on what type of variables are in the dataset. Moreover, the variable names should reflect the data under it so people could easily distinguish which variables will be included in performing the data analysis. Some variables have also more valuable information than the others so they should take it into account. Since aggregating data from multiple sources would be needed sometimes, it is a must for the analysts to check for the consistency of the data so they would not have to check the data again if they will encounter some problems during the analysis. 

 

 

Relevance: Data must be relevant. It is important that the data which will be used is the most updated one so it could be utilized efficiently. Aside from that, the data must be aligned with the project that the analyst are working on. It would be non-sense and confusing if they will use a data which is irrelevant to the topic. Moreover, one could derive some insights from it but it would not provide the information needed in making decisions.

 

 

Reliability: Data must be reliable. This can be measured through examining the consistency of the data. For example, if the analysis will be conducted several times under the same method and circumstances, then it should produce the same results. It could also be tested using the different parts of the study or questionnaire which should measure the same thing but still give the same results. With those, the data could be considered reliable and that variations on results from the possible external factors  could be avoided.

 

In today’s world, industries are continuously evolving and analytics has been a part of it. Many of them such as the business companies have been using data to analyze, predict, and base most of their decisions for their daily operations and come up with better solutions from the insights that they could obtain on the results. The abovementioned factors are only some of the factors that must be considered in ensuring good data quality because there are a lot things which could affect the performance of the data. With that, people must also be extra careful in handling data and make sure that they really understand it because data quality is the foundation of every successful analysis performed.