Cloudera Metadata to Collibra
Lucid’s Collibra Cloudera integration template loads the Cloudera metadata assets into Collibra DGC. Metadata of Cloudera assets(HDFS, Hive) are extracted using Cloudera Navigator REST APIs.
Some key features:
• Schema and Field level metadata for HDFS files are available for self-describing file formats like Parquet. For other file formats in HDFS, field level metadata in not available
• Asset extraction can be filtered based on directory and files in case of HDFS and based on Database and Schema for Hive
• Target community in DGC for loading the assets can be specified
• Source metadata is mapped to DGC asset types as indicated in the diagram below
• Modification and Delete of the source (Cloudera objects) are automatically captured and can be used to trigger actions in DGC such as asset status change, asset management workflows, etc.
• Support for full load and incremental load. Incremental loads can load changes since the last successful execution or from a specific date passed as a parameter
• Metadata loads can be scheduled or triggered on-demand
• Connectivity check template available for environment verification