The data volume and the frequency of data changes has increased exponentially in the context of digitalization. That leads to very heterogeneous data landscapes with many different data sources. To leverage big data advantages, the key success factor is to get more transparency and fast access to the data. In previous information management projects CAMELOT identified that many data-driven companies explore use cases in order to improve big data governance.
What Is a Data Catalog and Why Is It Needed?
A Data Catalog (DC) is designed to help users finding the right information quickly. This need has become even more important in complex data environments with multiple sources. An appropriate tool support is required that helps users to collaboratively use the information within a defined framework of workflows and rules. The liberation of data should be established with fast access and searchability.
Who Needs a Data Catalog?
A DC tool can be used across different domains such as Procurement, Sales, Finance, SCM, etc. Data analysts, data scientists, data stewards and other roles need to find and understand data. For example, a data scientist needs to access multiple data domains and sources, e.g. from an ERP or CRM system. Via a workflow the access to domain specific tables could be granted by the domain expert, e.g. a data steward of ERP material data. This governance is important, as for example under GDPR not every data should be accessible by every user. With a strict governance through all processes and roles, a DC ensures compliance and privacy.
How Can a Data Catalog Be Used?
It is important that the DC contains and structures business definition of the data so that the users of different departments can work with it independently. This clearly differentiates it from a data model, because the user group is not necessarily limited to data users only. With a broad set of functionalities, a DC can transfer raw data from different sources into consumption ready data for various users. It provides certain aspects of Data Visualization, such as scorecards, data quality dashboards and reports. Data Governance is another function with role-based views or the ability to use workflows. Additionally, functions in the area of Data Collaboration, Data Analytics, Data Modelling, Data Integration or Metadata Management can be part of a DC platform.
Across our projects, CAMELOT is recognizing that DCs are platforms that combine several trend topics. This leads to a huge variety of solutions with different strengths. For example, some perform very well in the area of data quality, others in data collaboration. Targeting explorative big data scenarios, DCs can govern the data analytics part successfully. It is very important to guide our customers to the real use-case they want to tackle with a potential tool support. Generally, CAMELOT sees a huge potential leveraging DCs to accomplish data liberation, transparency and trust.
We would like to thank Sascha Zygar for his contribution to this article.