Google Cloud Dataflow to Collibra Integration
Overview
Google Cloud Dataflow is used to manage and execute various data processing patterns. This integration helps analysts, and data scientists understand where the data is coming from, where it has been, how it is being used and who is using it. As an example, it can be used to identify the root cause of bad data events, and checking regarding the impact analysis prior to making data changes.
This Spring Boot integration retrieves the job details (pipelines) from Google Cloud Dataflow, transforms, and upserts them to a Collibra Platform instance as assets and technical lineage.
Use Cases
This integration will help you increase trust and data citizen engagement around data streams that pass through, are collected by, or are produced in Google Cloud Dataflow.
The data center makes it straightforward for you to track changes in your Google Cloud Dataflow metadata. This gives you the confidence to plan and engage with relevant business process owners accordingly.
Elements in Scope
The integration is designed to retrieve the following metadata:
- Mapping Specification
- Table
- System
- File
- GCS Bucket
- GCP Dataflow Step
To receive support on this item, you can engage our Professional Services team or post any questions in the Data Citizens Community.
Media
More details
Release Notes
- Added Docker files.
- Updated the Collibra Integration Library version to 1.1.10.
- Updated the Spring Boot version to 2.7.5.
- Code enhancements.
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 1.8
- Spring Boot Integration Library
License and Usage Requirements
Release History
Release Notes
- Updated the Spring Boot Starter Parent version to 2.5.12 (CVE-2022-22965).
- Updated the Collibra Integration Library version to 1.1.5.
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 1.8
- Spring Boot Integration Library
License and Usage Requirements
Release Notes
Updated the Spring Boot Integration Library dependency version in the pom.xml file to v1.1.3 that supports the latest Collibra Platform versions (v2022.01+).
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 1.8
- Spring Boot Integration Library
License and Usage Requirements
Release Notes
Updated the Log4j version from 2.16 to 2.17 due to vulnerabilities.
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 8
- Spring Boot Framework
License and Usage Requirements
Release Notes
Updated logger log4j2 dependency to Apache log4j2 version 2.16.0.
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 8
- Spring Boot Framework
License and Usage Requirements
Release Notes
- Added WebClient retry requests support
- Added note regarding using an external KeyStore file
- Updated the log4j2.xml file to include the Collibra logger
Compatibility
- Spring Boot Framework
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 8
- Spring Boot Framework
License and Usage Requirements
Release Notes
Initial release:
A Spring Boot integration that retrieves job details from Google Cloud Dataflow, transforms and upserts them to a Collibra Platform instance as assets and technical lineage (using the Collibra Lineage Harvester and the Collibra Integration Library).
Compatibility
- Spring Boot Framework 2.5.0
- Eclipse IDE
- Collibra Data Intelligence Cloud
- Collibra Data Intelligence On-Prem
Dependency
- Java Runtime Environment 8
- Spring Boot Framework 2.5.0
License and Usage Requirements
See existing Q&A in the Collibra Community
Browse discussions with customers who also use this app.
Start a New Topic in the Collibra Community
Collibra-hosted discussions will connect you to other customers who use this app.
The following terms shall apply to the extent you receive the source code to this offering.
Notwithstanding the terms of the Binary Code License Agreement under which this integration template is licensed, Collibra grants you, the Licensee, the right to access the source code to the integrated template in order to copy and modify said source code for Licensee’s internal use purposes and solely for the purpose of developing connections and/or integrations with Collibra products and services.
Solely with respect to this integration template, the term “Software,” as defined under the Binary Code License Agreement, shall include the source code version thereof. Except with respect to the foregoing, all remaining terms of the Binary Code License Agreement shall apply to the license of integration template hereunder.
Bhagyashree Mohan
Dear Team,Could you please provide Integration steps from gcp dataflow to collibra. I see other products and service have documentation steps.but i couldnt find a one to do this for my POC.
Paulo Taylor
Thanks for your question. The documentation is now online.