Collibra and Unity for catalog and lineage tracking
Overview
Creates, updates and represents existing Unity Catalog metastore, catalog, schema, table and column resources in Collibra.
A metadata archive a Spring Boot integration that takes requests from Collibra and consume Unity Catalog REST API services to discover and register unity catalog metastores, catalogs, schemas, tables and columns. At the time of this submission, Unity Catalog was in Private Preview and the UC REST API was limited in what it could offer. The following areas are not covered by this document today but are in scope of future releases:
Delta Sharing APIs
Databricks-internal APIs
e.g. all related Column ACL
Requirements and user stories:
- As a data engineer I want to give my data steward and data users full visibility of your databricks metastore resources by bringing metadata into a central location, making data available and easily accessible across your organization.
- As a data steward i want to improve data transparency by helping establish a one enterprise-wide repository of assets, so every user can easily understand and discover data relevant to them.
At the time of this submission, Unity Catalog was in Public Preview and the UC REST API was limited in what it could offer. Databricks regularly provides previews to give you a chance to evaluate and provide feedback on features before they’re generally available (GA). These preview releases can come in various degrees of maturity, each of which is defined in this article. For more information about Databricks Runtime releases, including support lifecycle and long-term-support (LTS), see Databricks runtime support lifecycle.
Mar 2022 update: Unity Catalog is now in gated public preview. During this gated public preview, Unity Catalog has the following limitations.
-
Python, Scala, and R workloads are supported only on Data Science & Engineering or Databricks Machine Learning clusters that use the Single User security mode and do not support dynamic views for the purpose of row-level or column-level security.
-
Unity Catalog can be used together with the built-in Hive metastore provided by Databricks. External Hive metastores that require configuration using init scripts are not supported.
-
Overwrite mode for dataframe write operations into Unity Catalog is supported only for managed Delta tables and not for other cases, such as external tables. In addition, the user must have the CREATE privilege in the parent schema and must be the owner of the existing object.
-
Service Principals are not supported as account-level identities.
Please refer to Unity Catalog Preview Limitations for more information.
Apr 2022 update: Welcome to the Data Lineage Private Preview! Unity Catalog now captures runtime data lineage for any table to table operation executed on a Databricks cluster or SQL endpoint. Lineage is captured at the granularity of tables and columns, and the service operates across all languages.
Media
More details
Release Notes
Apr 2022 update: Welcome to the Data Lineage Private Preview! Unity Catalog now captures runtime data lineage for any table to table operation executed on a Databricks cluster or SQL endpoint. Lineage is captured at the granularity of tables and columns, and the service operates across all languages.
Compatibility
- Spring Boot Framework
- Unity Catalog API
- Lineage Tracking API
- Collibra Cloud
Dependency
- Unity Catalog API
- Java Development Kit v1.11
- Spring Boot framework v2.6.6
License and Usage Requirements
- Collibra Catalog
Release History
No previous versions of this listing is available.