How to Measure Data Quality – 7 Metrics to Assess the Quality of Your Data
Businesses today are increasingly dependent on an ever-growing flood of information. Whether it is sales records, financial and accounting data, or sensitive customer information, the accuracy and adequacy of a company’s ability to measure data quality is critical. If portions of that information are inaccurate or incomplete, the effect on the organization can range from embarrassing to catastrophic.
What Is Data Quality?
Data quality refers to the ability of a set of data to serve an intended purpose. Today’s businesses are using data to generate value in a myriad of different ways, but they simply can’t accomplish their objectives using low-quality data. We often describe data quality in terms of the following four dimensions:
- Completeness refers to the presence of all required information within a dataset. For example, if the customer information in a database is required to include both first and last names, any record in which the first name or last name field is not populated is considered incomplete.
- Validity describes the conformance of data to business rules such as the format (e.g. number of digits), allowable data types (integer, floating-point, string, etc.), and range (minimum and maximum values). For example, a telephone number field that contains the string ‘1809 Oak Street’ is not valid.
- Timeliness refers to whether the information is sufficiently up-to-date for its intended use. Is the correct information available when needed? If a customer has notified your company of an address change, but that information is not available when billing statements are processed, that indicates a problem with the timeliness of the data.
- Consistency is present when all representations of a particular item across multiple data stores match. If customer information is stored in both the ERP system and a separate CRM system, for example, it’s important that the address, order history, and other important details match.
7 Metrics to Assess Data Quality
To measure data quality – and track the effectiveness of data quality improvement efforts – you need, well, data. What does data quality assessment look like in practice? Following are seven examples.
Metric | Definition | How to calculate |
---|---|---|
Ratio of Data to Errors | How many errors do you have relative to the size of your data set? | Divide the total number of errors by the total number of items. |
Number of Empty Values | Empty values indicate information is missing from a data set. | Count the number of fields that are empty within a data set. |
Data Transformation Error Rates | How many errors arise as you convert information into a different format? | How often does data fail to convert successfully? |
Amounts of Dark Data | How much information is unusable due to data quality problems? | Look at how much of your data has data quality problems. |
Email Bounce Rates | What percentage of recipients didn’t receive your email because it went to the wrong address? | Divide the total number of emails that bounced by the total number of emails sent, then multiply by 100. |
Data Storage Costs | How much does it cost to store your data? | What is your data storage provider charging you to store information? |
Data Time-to-Value | How long does it take for your firm to get value from its information? | Decide what “value” means to your firm, then measure how long it takes to achieve that value. |
Let’s look at each of these metrics in a bit more detail:
1. The ratio of data to errors
This ratio offers an unequivocal way to measure data quality. Briefly stated, it consists of tracking the number of known errors – such as missing, incomplete, or redundant entries – within a data set relative to the overall size of the data set. If you find fewer errors while the size of your data stays the same or grows, you know that your data quality is improving. The disadvantage of this approach is that there might be errors of which you aren’t even aware. You don’t know what you don’t know. In this respect, the ratio of data to errors can potentially provide an overly optimistic view of data quality.
2. Number of empty values
Empty values often indicate that important information is missing, or that someone has used the wrong field to record it. This is a relatively easy data quality problem to track. You simply need to quantify the number of records within a data set containing empty fields and then monitor that value over time. It’s important, of course, to focus on data fields that significantly contribute to overall value. An optional memo field, for example, might not be a good indicator of data quality, whereas an essential value like a zip code or phone number corresponds more closely to the overall completeness of data sets.
Read our eBook
4 Ways to Measure Data Quality
See what data quality assessment looks like in practice. Review four key metrics organizations can use to measure data quality
3. Data transformation error rates
Problems with data transformation – that is, the process of taking data that is stored in one format and converting it to a different format – are often a sign of data quality problems. If a required field is null, or if it contains an unexpected value that does not conform to business rules, then it’s likely to trigger an error during the transformation process. By measuring the number of data transformation operations that fail (or take unacceptably long to complete) you can gain insight into the overall quality of your data.
4. Amounts of dark data
Dark data is data that your organization is collecting and storing, but not using. Large quantities of dark data often suggest underlying data quality problems, simply because no one is bothering to look at it. Why is that an issue? Because most organizations haven’t fully realized the potential value of the data they already have. If you want to put that data to good use, now is the time to bring it out of the dark and take a look at its accuracy, consistency, and completeness.
5. Email bounce rates
Digital marketing campaigns can only be successful if you’re working with a high-quality email list. Customer and prospect data can decay quickly, leading to poor-quality data sets and poorly performing campaigns. Low data quality is one of the most common causes of email bounces. They happen because errors, missing data, or outdated data cause you to send emails to the wrong addresses.
6. Data storage costs
Are your data storage costs rising while the amount of data that you use stays the same? This can often be an indicator of data quality issues. If you are storing data without using it, it could be because the data has quality problems. If, conversely, your storage costs decline while your data operations stay the same or grow, you’re likely improving on the data quality front.
7. Data time-to-value
How quickly is your team able to turn data into business value? The answer can reveal a lot about the overall quality of your data. If data transformations generate a lot of errors, or if human intervention and manual cleanup are required, that can be a sign that your data quality is not what it should be. While many different factors affect data time-to-value, data quality problems are one of the most common points of friction.
The metrics that make the most sense for you to measure will depend upon the specific needs of your organization, of course. These are just guidelines for measuring data quality. Precisely offers data quality solutions that support data governance and compliance initiatives and produce a complete, single, and trusted view of your data.
Data integrity Completes the Picture
Data quality is just one element within the bigger picture of data integrity. Data integrity goes beyond data quality to include integration, data enrichment, and location intelligence.
When critical linkages between data elements are missing, that data is said to lack integrity. An example of data integrity would be a Sales Transactions table in which the customer ID points to a record in the Customer table. If a customer record is deleted without updating related tables, records in the Sales Transaction table that point to that particular customer become “orphans” because their parent record no longer exists. This represents a loss of referential integrity. An appropriate metric for data integrity would be the number of orphan records present in a database.
Your organization must have some kind of data quality assessment plan in place. The seven metrics we’ve discussed here offer a good starting point. For a deeper dive into data quality measurement, read our free eBook: 4 Ways to Meaure Data Quality