Data Quality Impacts on Downstream Analytics and Machine Learning Initiatives
Advanced analytics, including artificial intelligence and machine learning (AI/ML) are hot areas for investment right now, and for good reason. Organizations have access to more data than ever before, and to the extent that business leaders can draw useful insights from that information, they stand to gain significant new efficiencies and achieve a meaningful advantage over their competitors.
Unfortunately, there are some risks associated with diving head-first into advanced analytics and AI/ML without first attending to data integrity. Data integrity encompasses data quality, which ensures that information is accurate, complete, and consistent. But data quality is just one of the essential elements of data integrity. The others are integration, data enrichment, location intelligence, and data governance.
As the global leader in data integrity, Precisely addresses all five of these elements, ensuring that business leaders have access to accurate, complete, and fully contextual information. We help businesses to ensure that their data and the insights and decisions that are driven by that data can be fully trusted.
The Vital Importance of Data Quality
Machine learning has the potential to delve deeply into an organization’s data assets with the end goal of automating decisions or delivering insights to support better decisions. Property and casualty insurers, for example, are applying sophisticated algorithms to examine claims information as soon as it becomes available. By analyzing both structured and unstructured data and comparing it to a growing body of information about past claims, including the severity and type of damage, the location of the loss, geospatial factors surrounding the claim, and more, insurance companies can flag cases that merit closer attention, or highlight claims that should be expedited. Using Precisely’s powerful location intelligence technology, they can even pre-position adjusters in the areas likely to suffer the greatest losses from a major weather event.
These kinds of use cases are already being applied to achieve major efficiencies. With increased scale, the benefits accruing from those efficiencies become even greater. At the same time, though, the errors and inefficiencies that can potentially arise as a result of poor data quality will scale up as well. Given our example from above, poor data quality could result in too many false positives when detecting potential cases of fraud. That, in turn, can lead to wasted effort in the fraud department, not to mention frustrated customers.
Consider what happens when some portion of an organization’s decision-making process is delegated to a machine learning algorithm. If data quality is not maintained at sufficiently high levels, ML algorithms will “learn poorly” and may develop a skewed interpretation of reality which will subsequently form the basis for automated decisions or recommendations.
One company we know had deployed a web-based order form on which they asked customers to select their industry from a drop-down list. It was a required field, but it had no impact on pricing or any other feature of relevance to the customer. Not surprisingly, most customers simply selected the first choice on the list, leading to a situation in which most of the company’s clients identified themselves as “aerospace and defense” firms.
Imagine feeding the resulting data to a downstream analytics platform or ML algorithm. Any type of analysis that hinged upon the customer’s self-identified industry would be rendered completely inaccurate. To make matters worse, the people relying on that analysis might not even be aware of the problem. Inevitably, poor data quality will eventually lead to poor business decisions.
Data Discovery and Profiling
Effective data quality programs begin with data discovery, cataloging and profiling files and data stores to generate models of your organization’s metadata. Precisely’s data quality portfolio includes powerful tools that apply semantic intelligence and a collaborative glossary to classify, locate, and tag data for quick access, collaboration, and streamlined data insights.
Discovery and profiling tools enable business users to evaluate data sources to rapidly gain an understanding as to which ones are best suited to any particular use case. Data analysts can quickly access the information they need anywhere in your organization and profile it for accuracy and completeness.
Flexible, Scalable Data Quality Management
In today’s enterprises, data quality is never a “one and done” proposition. It requires an ongoing commitment to ensuring that corporate data assets are complete, consistent, accurate, and available to the right users at the right time. To make that work at scale, however, companies need technology that can automate processes, monitor potential issues in real time, and assign workflows to the right resources automatically.
Scalable data quality requires a rules-based approach, but one that can be quickly and easily adapted to rapidly changing business needs, data sources, and enterprise infrastructures – including big data and the cloud. Open APIs allow enterprises to seamlessly connect with custom and third-party applications while controlling and managing data quality services centrally from one location. In this way, enterprises with complex technology environments can use the same rule sets and standards across an unlimited number of applications and systems, in batch or in real time, whether they are located on-premise or in the cloud.
Today’s global enterprises also need access to worldwide address verification capabilities based on local, country-by-country databases, and which can apply the appropriate country’s postal rules to clean and correct name and address data. As the world’s leader in location intelligence, Precisely offers unmatched capabilities for address verification and geocoding.
Precisely: A One-Stop Shop for Data Integrity
As business endeavors to augment critical business processes with advanced analytics and AI/ML, data quality is imperative. Business leaders must see that their organizations’ data assets are clearly understood and the information they contain is accurate, consistent, and complete. Precisely offers a complete portfolio of data integrity products, including data quality solutions that address a wide range of requirements, from small companies to large enterprises.
In addition to our data quality portfolio, Precisely offers solutions for integration, data enrichment, location intelligence, and data governance.