Lucid’s Collibra Cloudera integration template loads the Cloudera metadata assets into Collibra DGC. Metadata of Cloudera assets(HDFS, Hive) are extracted using Cloudera Navigator REST APIs.
Some key features:
• Schema and Field level metadata for HDFS files are available for self-describing file formats like Parquet. For other file formats in HDFS, field level metadata in not available
• Asset extraction can be filtered based on directory and files in case of HDFS and based on Database and Schema for Hive
• Target community in DGC for loading the assets can be specified
• Source metadata is mapped to DGC asset types as indicated in the diagram below
• Modification and Delete of the source (Cloudera objects) are automatically captured and can be used to trigger actions in DGC such as asset status change, asset management workflows, etc.
• Support for full load and incremental load. Incremental loads can load changes since the last successful execution or from a specific date passed as a parameter
• Metadata loads can be scheduled or triggered on-demand
• Connectivity check template available for environment verification