Azure Purview to Collibra Integration
Overview
Overview
Azure Purview is a data governance tool that supports automated data discovery, lineage identification, and data classification.
This Spring Boot integration invokes Azure Purview Atlas API to retrieve details of specific entities, map the details into corresponding Collibra objects, and then ingest them into Collibra via the Import API. In this way, it allows Collibra users to access the same information that is present on Purview – i.e., metadata about other systems.
The integration also allows users a degree of control in choosing what type of entities and which subset of entities of a specific type are mapped into Collibra. For example, this could be:
- all the schemas, tables, and columns of the client’s databases
- all folders and files in the client’s file system.
Use Cases
Azure Purview provides additional native cataloging capabilities for various applications within the Azure landscape. Introducing this metadata into Collibra means that business users are able to easily navigate between more traditional data sources and cloud based storage solutions, such as data lakes and warehouses. This enables the client to have a complete view of the entire set of technical assets in their enterprise catalogue, Collibra.
Elements in Scope
The Azure Purview integration currently ingest metadata information for the following Azure entities:
- Azure Storage Account
- Azure Datalake Gen2 Service
- Azure Datalake Gen2 Filesystem
- Azure Datalake Gen2 Path
- Azure Datalake Gen2 Resource Set
- Atlas Glossary Term
- Azure SQL Server
- Azure SQL DB
- Azure SQL Schema
- Azure SQL Table
- Azure SQL View
- Azure Blob Service
- Azure Blob Container
- Azure Blob Path
- Azure Blob Resource Set
- Azure Datalake Gen1 Account
- Azure Datalake Gen1 Path
- Azure Datalake Gen1 Resource Set
- MSSQL Instance
- MSSQL DB
- MSSQL Schema
- MSSQL Table
- MSSQL View
- Azure SQL DW DB
- Azure SQL DW Schema
- Azure SQL DW Table
- Azure SQL DW View
- Power BI Workspace (from which Power BI Server and Capacity are inferred)
- Power BI Dashboard (which contains Power BI Tiles)
- Power BI Report
- Power BI Datasets
- Power BI Table
- Power BI Column
- API
- API Endpoint
- API Schema
- API Field
- Azure Data Factory Pipeline
- Azure Data Factory Activity
- Azure Data Factory Activity Operation
To receive support on this item, you can engage our Professional Services team or post any questions in the Data Citizens Community.
Media
More details
Release Notes
Changed
- Instead of ignoring entities numbering more than 100,000, the integration will now try to split the search based on the creation time of these entities.
- Updated the Spring Boot Starter Parent version to 2.7.5.
- Replaced Springfox with SpringDoc for generating the Swagger UI page.
- Removed properties.file.version property.
- Importing Gen1 files in batches.
- Removed CMA file (which was incompatible with later versions of Collibra). Commented out the 3 custom attribute properties contained in the CMA file: collibra.attribute.size/version/nickname
- The integration will not fail when Purview API bulk entity details requests keep on failing after all retries. Instead, these entities are skipped.
Added
- Added delta sync feature.
- Added dependencies to resolve security vulnerabilities.
- Added memory optimizations.
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 1.8
- Spring Boot Integration Library
- Collibra Platform v2021+
License and Usage Requirements
Release History
Release Notes
- Fixes to avoid Out-of-Memory errors when processing many Purview entities
- Fixes to avoid RequestTimeout errors
- azure.token.refresh.seconds property
- 22 additional attributes (from “User Description” to “Schema ID” in the Attribute Types table)
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 1.8
- Spring Boot Integration Library
- Collibra Platform v2021+
License and Usage Requirements
Release Notes
- Upgraded Spring Boot version to 2.5.12 due to vulnerability CVE-2022-22965
- Parameterized File Schema asset type using property collibra.asset.file.schema
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 1.8
- Spring Boot Integration Library
- Collibra Platform V2021+
License and Usage Requirements
Release Notes
Added
- API entities
- Azure Data Factories entities
- Whitelisting and Blacklisting
- 17 additional attributes (from Access Tier onwards in the Attribute Types table above)
- Tags on Field assets are also added on parent File asset
- A flag property azure.datalake.gen2.inferred.directories to determine whether Gen2 Directories are constructed by the integration from file paths or explicitly retrieved via Purview API. When true, the directories are constructed from file paths (which requires less API calls and processing, but details in Directory asset attributes will be left empty). When false, the directories are explicitly retrieved via Purview API (which requires more API calls and processing, but details in Directory asset attributes will be included)
- An unmappedAttributes file listing all the Purview entities which have been synced along with all corresponding Purview attributes which have not been mapped to Collibra attributes.
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 1.8
- Spring Boot Integration Library
- Collibra Platform V2021+
License and Usage Requirements
Release Notes
Updated the Log4j version from 2.16 to 2.17 due to vulnerabilities.
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 1.8
- Spring Boot Integration Library
- Collibra Platform V2021+
License and Usage Requirements
Release Notes
Updated logger log4j2 dependency to Apache log4j2 version 2.16.0.
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 1.8
- Spring Boot Integration Library
- Collibra Platform V2021+
License and Usage Requirements
Release Notes
Initial release
Ingest metadata from Purview into Collibra.
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
Dependency
- Java Runtime Environment 1.8
- Spring Boot Integration Library
- Collibra Platform V2021+
License and Usage Requirements
See existing Q&A in the Collibra Community
Browse discussions with customers who also use this app.
Start a New Topic in the Collibra Community
Collibra-hosted discussions will connect you to other customers who use this app.
The following terms shall apply to the extent you receive the source code to this offering.
Notwithstanding the terms of the Binary Code License Agreement under which this integration template is licensed, Collibra grants you, the Licensee, the right to access the source code to the integrated template in order to copy and modify said source code for Licensee’s internal use purposes and solely for the purpose of developing connections and/or integrations with Collibra products and services.
Solely with respect to this integration template, the term “Software,” as defined under the Binary Code License Agreement, shall include the source code version thereof. Except with respect to the foregoing, all remaining terms of the Binary Code License Agreement shall apply to the license of integration template hereunder.