eBook

Looking For a Data Catalog?

There’s a lot of buzz around data cataloging tools right now — and a growing number of solutions from more and more vendors. What exactly is a data catalog? And how do you make sure you are not getting lost in the process of selecting the right catalog to meet your needs?

Don’t get lost. Here’s your guide to getting started.

Read this eBook to learn the basics of what a catalog is and how it works, what business challenges it can help solve, and how to make sure you are avoiding common pitfalls and choosing the right one for your needs.

 

What is it and how does it work?

In a nutshell, a data catalog is a place that shows what data assets (i.e. reports, databases, websites which contain data) you have and where they are located.

How does a data catalog work and how does it help organizations get a handle on their data and more importantly, use it to make decisions and drive business value? Pictured below is a simple graph that illustrates how a data catalog solution can work to deliver business outcomes.

Looking for a Data Catalog?

How does an optimal data catalog work?

Here are the 5 stages showing how a data catalog can deliver on the business outcome: “I want to delight my customer.”

How does an optimal data catalog work

 

Data governance and stewardship

Across all stages data definitions must be set up based on rules and standards to find the available data throughout the enterprise, know where it is, and ensure it is trustworthy. Finding your data is one step to the process. connecting it to business outcomes provides the full solution.

Do I need a data catalog

Do I need a data catalog?

With the tremendous growth in the volume of data, increased access to multiple data sources, along with new compliance regulations, organizations are working to “get a handle” on their enterprise- wide data. To do so they must be able to answer the questions:

  • What data do I have?
  • Where is it?
  • Where did it come from?
  • How is it being used?

As a result, data catalog solutions have gone from being a “nice to have” to a “must have” in the arsenal of data governance capabilities. In the research report Data Catalogs are the New Black in Data Management and Analytics Research, Gartner reports that demand for data catalogs is soaring as organizations struggle to inventory their distributed data assets to facilitate data monetization and conform to regulations.

How do you know if you need a data catalog?

If you find yourself saying the following, you may need a data catalog (or data catalog + governance) solution:

“I need better analytics!”

Many organizations are asking how to gain more value from analytics and have better visibility into their data. The introduction of IoT and digital transformation have resulted in an abundance of data. Now organizations need to find the available data and confirm it’s fit for purpose so it can be used for decision-making.

I need better analytics

“I’ve invested in B.I., but is the reporting data correct?

There has been a surge in the investment in B.I. software. Locating the right data for analysis and reporting is a challenge that must be solved when implementing B.I. While some organizations are able to locate their data, they cannot identify the source to confirm it’s valid. Still others are finding conflicting results between two different reports.

Is the reporting data correct

“My data lake has become a data swamp.”

Your data lake seemed to be the answer to all of your problems. But now, business stakeholders are unable to access the information they need from the data lake. No one is certain what data exists in the lake or how to access it.

 

My data lake has become a data swamp

“How do I prepare my organization for A.I.?”

As A.I. moves into the mainstream, organizations are finding that identifying the right data to inform the algorithm is critical. This applies to the input data along with the features of the data itself, including tagging the data, having the right metadata, user data, etc. The first step in this process, then, must be to discover and catalog the data.

 

How do I prepare my organization for A.I.

In all of these cases, there is a common thread. Organizations must be able to answer, “What data do we have and where is it?” But they don’t only need to “find” their data, they also need greater data intelligence to understand how it connects to their enterprise metadata, and more importantly, to their business outcomes. 

As organizations start to flock to the most popular solutions, they should take heed of Gartner’s advice, which cautions that organizations take the time to find the “right” solution and make sure that it can be aligned with organizational initiatives. As stated in the Gartner research: “Data catalog projects will fall short of their full potential if data and analytics leaders don’t link them to broader data management needs.” See Pitfall #2

“Data catalog solutions have gone from a 'nice to have' to 'must have' in the arsenal of data governance capabilities.”

What is the typical implementation timeline and how do I avoid pitfalls?

Data catalogs should be easily implemented within a few weeks to months. However, there are a few reasons why companies might experience more painful, less timely projects. If you have done your due diligence and selected a data catalog that is cloud-based, “on the stack” and aligned with your enterprise data system and metadata management strategies, then it should be smooth sailing. However, if you have decided on a catalog that requires up-front customization, specific hardware or a team of specialized developers then you might be looking at a costly project.

Pitfall 1: Don’t take a vendor’s word for it

Vendors want to sell their solution. So sometimes weakness and limitations are glossed over. It is your job to make sure that you aren’t falling for “market-tecture.” When deciding on a catalog, check popular review sites like Gartner Peer Insights, speak with analysts and make sure you ask references about implementation.

Pitfall 2: Don’t be shortsighted

According to Gartner, companies should “Avoid data catalogs that do not have the ability to scale out beyond tactical use case requirements and connect to the broader enterprise metadata management and data initiatives.” Some companies are choosing data catalogs based on a single, tactical use case, such as inventorying the data in their data lakes. It’s important to understand that deploying a catalog for one tool or use will improve data usability, trust and shareability ONLY for that specific tool. This ultimately creates the need for a data catalog of all the data catalogs in your architecture. Make sure that you have evaluated options that span across multiple use cases and can address your broader needs.

Pitfall 3: Don’t assume that every catalog is usable by everybody

Some catalogs are built for a more technically minded user who is using SQL. These catalogs have some high- tech capabilities and provide a full picture into the technical lineage and providence of every bit of data in the ecosystem. Others are built more for business users that don’t care about SQL or about technical lineage, but are interested in the data that matters for the initiative they care about in a user-friendly way. Who is going to be using your catalog and for what reason? Make sure that you don’t try to force your business users into being IT coding experts. This could cause serious issues with adoption and ROI.

How do I choose the best data catalog?

It’s essential to spend the time up front to identify what functionality is important to your organization. You might find that different groups have different needs. Having this list defined when you start your search will help ensure you’re selecting the right solution. At a bare minimum, data catalogs should be able to:

  • Discover what data is available
  • Identify where it is located
  • Provide information on whether that data is fit for purpose

Once you’ve checked the box on that basic functionality, there are several other considerations to ensure your catalog can be used to add business value in the future:

  • Will it provide real-time integration with your data sources so that they are continuously populating the data catalog with the data that is critical to you?
  • Is it easy to use for technical and non-technical users?
  • Can it search all your databases, on-premise and in the cloud?
  • Will you be able to connect your data assets directly to organizational goals and initiatives so that you can see and measure how data drives your business?
  • What augmented or AI/ML capabilities can drive greater operational efficiencies and data intelligence?

How do I choose the best data catalog

Read the full eBook

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.