Skip to main content

Google

From Data Swamp to Data Lake: Data Catalog

This is the second blog in a series that explains how organizations can prevent their Data Lake from becoming a Data Swamp, with insights and strategy from Perficient’s Senior Data Strategist and Solutions Architect, Dr. Chuck Brooks. Read the first blog, here

 

In the first article in this series, I explained the five components necessary to prevent a Data Lake from Becoming a Data Swamp. The five capabilities are:

  1. Create a Data Catalog
  2. Create a Data Governance organization
  3. Implement data quality analysis and reporting
  4. Implement category-based security in the Data Lake
  5. Have multiple data zones inside the Data Lake

In this article, we will discuss the Data Catalog.

 

The Data Catalog and Metadata Management

A Data Catalog is a collection of metadata, combined with data management and search tools, that helps corporate knowledge workers find the data that they need. The Data Catalog serves as an inventory of available data and provides information to evaluate the usefulness and quality of data to answer business questions and make better business decisions.

 

Data Catalogs have become the standard for metadata management in the age of big data and self-service business intelligence. The metadata knowledge workers need to understand and use data today continues to become more expansive than in the past.  A successful Data Lake transformation and adoption is dependent on the ability of knowledge workers to find, access, and use (reuse) data in the Data Lake. Ensuring success with enterprise data requires the formal integration of multiple lines of business, technology, and processes through data management and governance to create a comprehensive data catalog. A data catalog organizes the technical details around data assets, or metadata, into defined, meaningful, and searchable business assets that enable consistent understanding among all data knowledge workers. A data catalog is essential to knowledge workers because it combines and organizes details about data assets in the data lake by presenting them in an easy-to-understand format. The data catalog provides clarity into data definitions, synonyms, and essential business attributes so all knowledge workers understand and can leverage data as an asset. When knowledge workers have important data questions, they can turn to the data catalog, which identifies data owners, stewards, and subject matter experts, enabling easy collaboration between different organizational business units. The data catalog will keep your Data Lake from becoming a Data Swamp by providing:

  • Improved productivity and reduced time spent by teams searching for relevant information or data
  • Increased visibility on key datasets that exist in the data lake
  • Avoid double purchases of similar datasets by different teams
  • Lineage to give knowledge workers a clear view of the flow and dependencies of data through the organization and business processes.
  • Improved collaboration between knowledge workers
  • Faster processes to access and interpret the data
  • Facilitated compliance with growing international privacy and reporting regulations
  • Common KPIs and Data Definitions make data comparable and understandable
  • Facilitated data relevancy and usage tracking

 

Google’s Data Catalog (now part of Dataplex) and Perficient’s Frameworks

 

Picture1Google’s Data Catalog and Perficient’s Meta Data Manager

The Google Data Catalog (now part of Dataplex)  helps knowledge workers understand data assets in Google Cloud and beyond. Integrations with BigQuery, Pub/Sub, Cloud Storage, and many connectors provide a unified view and tagging mechanism for technical and business metadata. Google Data Catalog empowers all knowledge workers in the organization to find or tag data with a powerful UI, built with the same search technology as Gmail, or via API access.

Perficient’s Metadata Manager is a framework that enhances the Google Data Catalog and offers a UI that makes metadata tagging and searching easier for knowledge workers and data stewards. Perficient Metadata Manager also provides data quality analysis and reporting capabilities.

 

Read the next blog in the series, here.

 

Perficient’s Cloud Data Expertise

The world’s leading brands choose to partner with us because we are

large enough to scale major cloud projects, yet nimble enough to provide focused expertise in specific areas of your business. Our cloud, data, and analytics team can assist with your entire data and analytics lifecycle, from data strategy to implementation. We will help you make sense of your data and show you how to use it to solve complex business problems. We’ll assess your current data and analytics issues and develop a strategy to guide you to your long-term goals.  Learn more about our Google Data capabilities, here.

Download the guide, becoming a Data-Driven Organization With Google Cloud Platform, to learn more about Dr. Chuck’s GCP data strategy.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Chuck Brooks

Dr. Chuck is a Senior Data Strategist / Solution Architect. He is a technology leader and visionary in big data, data lakes, analytics, and data science. Over a career that spans more than 40 years, Dr. Chuck has developed many large data repositories based on advancing data technologies. Dr. Chuck has helped many companies become data-driven and develop comprehensive data strategies. The cloud is the modern ecosystem for data and data lakes. Dr. Chuck’s expertise lies in the Google Cloud Platform, Advanced Analytics, Big Data, SQL and NoSQL Databases, Cloud Data Management Engines, and Business Management Development technologies such as SQL, Python, Data Studio, Qlik, PowerBI, Talend, R, Data Robot, and more. The following sales enablement and data strategy results from 40 years of Dr. Chuck’s career in the data space. For more information or to engage Dr. Chuck in an engagement, contact him at chuck.brooks@perficient.com.

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram