Unity Catalog and Collibra integration
Overview
This Spring Boot integration consumes the data received from Unity Catalog and Lineage Tracking REST API services to discover and register Unity Catalog metastores, catalogs, schemas, tables, columns, and dependencies.
At the time of this submission, Unity Catalog was in Public Preview and the Lineage Tracking REST API was limited in what it provided. We expected both API to change as they become generally available. The following areas are not covered by this version today, but are in scope of future releases:
- ACL
- ABAC
This version completes Databricks Delta Sharing.
Use cases
- As a data engineer, I want to give my data steward and data users full visibility of your Databricks Metastore resources by bringing metadata into a central location. In this way, data will become available and easily accessible across your organization.
- As a data steward, I want to improve data transparency by helping establish an enterprise-wide repository of assets, so every user can easily understand and discover data relevant to them.
- As a data producer, I want to share data sets with potential consumers without replicating the data.
Databricks regularly provides previews to give you a chance to evaluate and provide feedback on features before they’re generally available (GA). These preview releases can come in various degrees of maturity, each of which is defined in this article. For more information about Databricks Runtime releases, including support lifecycle and long-term-support (LTS), see Databricks runtime support lifecycle.
Mar 2022 update: Unity Catalog is now in gated public preview. During this gated public preview, Unity Catalog has the following limitations.
Python, Scala, and R workloads are supported only on Data Science & Engineering or Databricks Machine Learning clusters that use the Single User security mode and do not support dynamic views for the purpose of row-level or column-level security.
Unity Catalog can be used together with the built-in Hive metastore provided by Databricks. External Hive metastores that require configuration using init scripts are not supported.
Overwrite mode for dataframe write operations into Unity Catalog is supported only for managed Delta tables and not for other cases, such as external tables. In addition, the user must have the CREATE privilege in the parent schema and must be the owner of the existing object.
Please refer to Databricks Unity Catalog General Availability | Databricks on AWS for more information.
May 2022 update: Welcome to the Data Lineage Private Preview! Unity Catalog now captures runtime data lineage for any table to table operation executed on a Databricks cluster or SQL endpoint. Lineage is captured at the granularity of tables and columns, and the service operates across all languages.
June 2022 update: Unity Catalog Lineage is now captured and catalogued both as asset relations and as custom technical lineage.
July 2022 update: Unity Catalog API will be switching from v2.0 to v2.1 as of Aug 11, 2022, after which v2.0 will no longer be supported.
August 2022 update: Unity Catalog is in Public Preview. During the preview, some functionality is limited. See Unity Catalog public preview limitations. To participate in the preview, contact your Databricks representative.
August 2022 update: Delta Sharing is now generally available, beginning with Databricks Runtime 11.1. For details, see Share data using Delta Sharing.
Media
More details
Release Notes
Fix critical common vulnerabilities and exposures
- CWE-94: Improper Control of Generation of Code (‘Code Injection’)
- CWE-611: Improper Restriction of XML External Entity Reference
- CWE-400: Uncontrolled Resource Consumption
- CWE-285: Improper Authorization
Compatibility
- Spring Boot Framework
- Unity Catalog API
- Lineage Tracking API
- Apache Beam
- Collibra Data Intelligence Cloud
Dependency
- Spring Boot framework v2.7.5
- Unity Catalog API
- Lineage Tracking API
- Apache Beam 2.42.0
- Python 3.9.13
- Java Runtime Environment 11
License and Usage Requirements
- Collibra Catalog
- Databricks Premium
Release History
Release Notes
Added a few additional resource properties.
Compatibility
- Spring Framework
- Unity Catalog API
- Lineage Tracking API
- Apache Beam
- Collibra Data Intelligence Cloud
Dependency
- Springboot Framework 2.6.6
- Java Runtime Environment 11
- Unity Catalog API
- Lineage Tracking API
- Python 3.9.13
- Apache Beam 2.42.0
License and Usage Requirements
- Collibra Catalog
- Databricks Premium
Release Notes
Moved away from core api to the import api as we take steps to Private Beta. We will fast-follow the initial GA release of this integration to add metadata and lineage capabilities as provided by Unity Catalog. “Support” during this phase is defined as the ability for customers to log issues in our beta tool for consideration into our GA version. There are no SLAs and the fixes will be made in a best efforts manner in the existing beta version. As soon as that functionality is ported to Edge based capability, we will migrate customers to stop using Springboot and migrate to Edge based ingestion. We will GA with the Edge based capability. Delta Sharing remains under Validation.
Compatibility
- Spring Framework
- Unity Catalog API
- Lineage Tracking API
- Apache Beam
- Collibra Data Intelligence Cloud
Dependency
- Springboot Framework 2.6.6
- Java Runtime Environment 11
- Unity Catalog API
- Lineage Tracking API
- Apache Beam
License and Usage Requirements
- Collibra Catalog
- Databricks Premium
Release Notes
As part of the release, the following features are released:
- new workflows including delete shares and recipients
- route requests to right app when multiple metastores
Compatibility
- Springboot Framework
- Unity Catalog API
- Lineage Tracking API
- Collibra Data Intelligence Cloud
Dependency
- Springboot Framework 2.6.6
- Java Runtime Environment 11
- Unity Catalog API
- Lineage Tracking API
License and Usage Requirements
- Collibra Catalog
Release Notes
As part of the release, the following features are released:
Register Unity Catalog Resources |
Sample flow that pulls all Unity Catalog resources from a given metastore and catalog to Collibra has been changed to better align with Edge. The workflow now expects a Community where the metastore resources are to be found, a System asset that represents the unity catalog metastore and will help construct the name of the remaining assets and an option domain which, if specified, will tell the app to create all metastore resources in that given domain. If not specified, each schema will be registered in its own domain. |
Compatibility
- Springboot Framework
- Unity Catalog API
- Lineage Tracking API
- Collibra Data Intelligence Cloud
Dependency
- Springboot Framework 2.6.6
- Java Runtime Environment 11
- Unity Catalog API
- Lineage Tracking API
License and Usage Requirements
- Collibra Catalog
Release Notes
As part of the release, the following features are released:
Register Unity Catalog Resources |
Sample flow that pulls all Unity Catalog resources from a given metastore and catalog to Collibra. |
Add Recipient to Delta Share |
Sample flow that grants access to a delta share to a given recipient. |
Add Dataset to Delta Share |
Sample flow that adds all tables found in a dataset to a given delta share. |
Add Table to Delta Share |
Sample flow that adds a table to a given delta share. |
Add Table to Delta Share Form |
Sample flow that adds a table to a delta share. |
Create Delta Share |
Sample flow that creates a delta share. |
Create Delta Share Recipient |
Sample flow that creates a delta share recipient. |
Delete Delta Share |
Sample flow that deletes a delta share. |
Delete Delta Share Recipient |
Sample flow that deletes a delta share recipient. |
Grant Delta Share Access to Recipient |
Sample flow that grants access to a delta share to a given recipient. |
Remove Recipient from Delta Share |
Sample flow that revokes access to a delta share from a given recipient. |
Remove Table from Delta Share |
Sample flow that removes a table from a given delta share. |
Revoke Delta Share Access from Recipient |
Sample flow that revokes access to a delta share from a given recipient. |
get Recipient Activation Key |
A simple workflow that shares the activation key when granted access to a given share. |
Compatibility
- Springboot Framework
- Unity Catalog API
- Lineage Tracking API
- Collibra Data Intelligence Cloud
Dependency
- Springboot Framework 2.6.6
- Java Runtime Environment 11
- Unity Catalog API
- Lineage Tracking API
License and Usage Requirements
- Collibra Catalog
Release Notes
As part of the release, the following features are released:
- Remove table from delta share workflows
- Revoke delta share access from recipient workflows
- Exception raised when tables without columns found (fix)
- Database views were created as tables if not found (fix)
Compatibility
- Springboot Framework
- Unity Catalog API
- Lineage Tracking API
- Collibra Data Intelligence Cloud
Dependency
- Springboot Framework v2.6.6
- Java Runtime Environment 11
- Unity Catalog API
- Lineage Tracking API
License and Usage Requirements
- Collibra Catalog
Release Notes
As part of the release, the following features are released:
- Limited Integration of Delta sharing APIs
- Addition of System attribute as part of Custom Technical Lineage
- Ability to combine multiple Custom Technical Lineage JSON(s)
Compatibility
- Springboot Framework
- Lineage Tracking API
- Unity Catalog API
- Delta Sharing API
- Collibra Data Intelligence Cloud
Dependency
- Springboot Framework 2.6.6
- Lineage Tracking API
- Unity Catalog API
- Delta Sharing API
- Java Development Kit
License and Usage Requirements
- Collibra Catalog
Release Notes
Version 1.0.7 will allow to extract metadata from databricks with non-admin Personal Access Token.
Compatibility
- Springboot Framework
- Lineage Tracking API
- Unity Catalog API
- Collibra Data Intelligence Cloud
Dependency
- Lineage Tracking API
- Unity Catalog API
- Springboot Framework 2.6.6
- Java Development Kit
License and Usage Requirements
- Collibra Catalog
Release Notes
the new release version 1.0.6 is for enhancing the application to accept wildcard character as part of schema names.
Compatibility
- Springboot Framework
- Unity Catalog API
- Lineage Tracking API
- Collibra Data Intelligence Cloud
Dependency
- Unity Catalog API
- Lineage Tracking API
- Java Development Kit
- Springboot Framework 2.6.6
License and Usage Requirements
- Collibra Catalog
Release Notes
Release to update the Spring Boot App for the changes in Databricks Unity Catalog API
Compatibility
- Springboot Framework
- Unity Catalog API
- Lineage Tracking API
- Collibra Data Intelligence Cloud
Dependency
- Unity Catalog API
- Lineage Tracking API
- Java Development Kit
- Springboot Framework 2.6.6
License and Usage Requirements
- Collibra Catalog
Release Notes
Unity Catalog API will be switching from v2.0 to v2.1 as of Aug 11, 2022, after which v2.0 will no longer be supported.
Compatibility
- Springboot Framework
- Unity Catalog API
- Lineage Tracking API
- Collibra Data Intelligence Cloud
Dependency
- Unity Catalog API
- Lineage Tracking API
- Java Development Kit 1.11
- Springboot Framework 2.6.6
License and Usage Requirements
- Collibra Catalog
Release Notes
June 2022 updated: Unity Catalog Lineage is now captured and catalogued both as asset relations and as custom technical lineage.
Compatibility
- Springboot Framework
- Unity Catalog API
- Lineage Tracking API
- Collibra Data Intelligence Cloud
Dependency
- Unity Catalog API
- Java Development Kit 1.11
- Springboot Framework 2.6.6
License and Usage Requirements
- Collibra Catalog
Release Notes
May 2022 update: Welcome to the Data Lineage Private Preview! Unity Catalog now captures runtime data lineage for any table to table operation executed on a Databricks cluster or SQL endpoint. Lineage is captured at the granularity of tables and columns, and the service operates across all languages.
Compatibility
- Spring Boot Framework
- Unity Catalog API
- Lineage Tracking API
- Collibra Data Intelligence Cloud
Dependency
- Unity Catalog API
- Java Development Kit v1.11
- Spring Boot framework v2.6.6
License and Usage Requirements
- Collibra Catalog
See existing Q&A in the Data Citizens Community
Browse discussions with customers who also use this app.
Start a New Topic in the Data Citizens Community
Collibra-hosted discussions will connect you to other customers who use this app.
The following terms shall apply to the extent you receive the source code to this offering.Notwithstanding the terms of the Binary Code License Agreement under which this integration template is licensed, Collibra grants you, the Licensee, the right to access the source code to the integrated template in order to copy and modify said source code for Licensee’s internal use purposes and solely for the purpose of developing connections and/or integrations with Collibra products and services.Solely with respect to this integration template, the term “Software,” as defined under the Binary Code License Agreement, shall include the source code version thereof. Except with respect to the foregoing, all remaining terms of the Binary Code License Agreement shall apply to the license of integration template hereunder.
Paulo Taylor
Databricks Unity Catalog connected to Collibra – a game changer!