Unsupported Screen Size: The viewport size is too small for the theme to render properly.

Data X-Ray for Collibra: Streamlining AI and Unstructured Data Governance

Published by: Ohalo
Latest version: 7.9
Released: May 29, 2024
Contact Publisher
Download DatasheetPackageDocumentation
Partner Offering

Partner Offerings are Apps published by third party Collibra partners. Partners create, own and are responsible for their Partner Offerings. Read more.

Overview

“Partner Offerings” are offerings published by third party Collibra Partners via the Collibra Marketplace. Partners create, own and are responsible for their Partner Offerings. Fees, if any, associated with Partner Offerings are designated and collected by the Partner.  Your use and purchase (if applicable) of Partner Offerings are subject to (a) the terms and conditions referenced on or via link within the Partner Offering listing, or (b) if such terms and conditions are not referenced on the listing, then the Collibra Marketplace License Agreement.  Your Master Agreement with Collibra for the Collibra Service DOES NOT apply to your use of the Partner Offerings (including any warranties, support services and service levels referenced therein).  Collibra may, but is not obligated to provide first level support for customers of the Collibra Service with respect to your use of a Partner Offering.

Data X-Ray’s integration with Collibra Data Catalog offers a sophisticated yet streamlined approach to managing unstructured data and governing AI systems. This partnership delivers a comprehensive solution, transforming the way organizations approach their data governance challenges:

Automated Discovery and Classification: Data X-Ray automates the identification and cataloging of unstructured data within the Data Catalog. This reduces manual efforts, enhances data privacy, and ensures ongoing compliance with less effort.

Enhanced Data Privacy and Compliance: The integration facilitates advanced privacy measures and risk management by automating the identification of personal data across data estates. It keeps the catalog of files and folders updated, aligning with both regulatory requirements and internal data policies.

Dynamic Metadata Management: Data X-Ray ensures that the Data Catalog is continuously updated with the latest metadata from data discovery. This includes business category mapping and physical location, ensuring data governance policies and compliance requirements are always current.

Key Use Cases Addressed:

  • Regulatory Compliance and Records/File Retention: Automates compliance processes for various regulations, ensuring documents across on-premise, cloud applications, and file storage meet external regulations and internal policies.
  • Data Migrations and Storage Cost Reduction: Identifies document categories during data movements and migrations, aiding in decisions on what to keep, archive, or delete to optimize storage costs.
  • AI Governance: Supports the curation of AI workspaces with role-based access, advanced monitoring, audit trails, and data lineage to ensure compliance with governance frameworks.

By integrating Data X-Ray with Collibra Data Catalog, organizations can expect a significant reduction in the time and effort required for unstructured data governance processes. This collaboration not only simplifies data management but also empowers organizations to navigate the complexities of AI and unstructured data governance with greater efficiency and confidence.

Media

More details

Release Notes

Categorization now possible with Ollama models

With Ollama support, you can reduce inference cost and increase privacy and security when categorizing documents. Find the configuration on Data X-Ray Console.

More options when integrating with Collibra

In the datasource settings, you can now configure your Collibra integration to synchronize only the documents that are relevant to you. This can be based on their category, age, owner, sensitive information and much more. In that same page, you can also configure the integration to run everytime the datasource finishes a new scan, keeping your data in Collibra up-to-date is easier than ever!

This configuration is compatible with settings profiles, allowing you to configure many datasources with a single setting profile.

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements

Release History

Version 7.8
June 12, 2024
Release Notes

Skip folders from classification

Avoid scanning unnecessary files and achieve faster results with our new folder-skipping filter. Exclude as many folders as you need while performing a classification scan. Find this feature in the datasource settings or the settings profile. It’s now available for SMB drives, with more datasource types to be supported in future releases.

New connector: Google Shared drives

Great news for our Google Workspace users! Data X-Ray now supports Google Shared Drives with its native connector. Scan personal and shared drives across your organization to discover and classify sensitive information.

Better document reviews

The revamped document review interface offers a range of enhancements, including:

  1. Sleeker interface
  2. Clickable annotated findings for easy navigation
  3. Comprehensive list of annotators, even those enabled but with no findings
  4. Direct document categorization for improved organization
  5. Access and review larger files
  6. Instant annotation overlays onto document content, without the need to re-run NLP scans

Easier Documentation Access

From this release onwards, Data X-Ray documentation is integrated directly into the product itself. Easily consult the latest features and functionalities of your specific Data X-Ray version without the need to navigate away or re-authenticate.

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 7.7
June 12, 2024
Release Notes

Simplifying Document Categorization configuration in Data X-Ray Console

Leveraging Large Language Models, Data X-Ray enhances your document categorization capabilities. It intelligently identifies the type of your document be it an invoice, NDA, CV, or something else.

To make this feature more accessible, we’ve centralized its configuration within Data X-Ray Console. Administrators can now easily select the desired model and adjust settings. Moreover, we’ve set a default limit for categorizing up to 100 documents on a single categorization. This limit can be easily adjusted in the settings should your needs exceed this default.

Running jobs are only visible by organizational unit

This release makes a change in the visibility of the running jobs for labels and categotizations. This information is useful to know how busy is an instance of Data X-Ray attending other jobs, but you could potentially see information from other organizational units. Now Data X-Ray only shows you the jobs from the last 30 days that were started by users in your organization. In order to understand if Data X-Ray is busy attending other requests, we display a summary of all accepted and running jobs.

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 7.6
June 12, 2024
Release Notes

Seamless document categorization

Data X-Ray offers seamless document categorization using annotators. While this method effectively identifies documents containing Personally Identifiable Information (PII), it may struggle to differentiate between various document types, such as invoices and curriculums.

To address this, Data X-Ray now utilizes Large Language Models to enhance document categorization. With the new document categorizer, you can classify documents into customizable categories like invoices, NDAs, or CVs based on their unique characteristics.

Newest data connector: Box

Data X-Ray adds Box to the list of supported native connectors. This integration empowers you to connect any Box user accounts to Data X-Ray to start gaining insights into the stored information. It allows you to easily identify and mitigate sensitive data exposure in Box, thereby strengthening your defense.

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 7.5
June 12, 2024
Release Notes

Configure all your datasources in a few clicks!

Whether you manage a dozen or a few thousand datasources, you had to configure them individually till now, even when they should share the same settings. We present today the Settings Profiles for datasources: a profile to configure datasource access, discovery, and classification settings. From the settings profiles page, you can quickly add or remove datasources you own to apply the same settings to them.

Connect Data X-Ray findings to Collibra Data Intelligence Platform

Keep a holistic view of your data inside the Collibra Data Intelligence Platform. We are revamping our integration with Collibra, available now in Technology Preview. The integration synchronizes all the files discovered in a datasource, their folder structure, and the annotators found in the classification phase.

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 7.4
June 12, 2024
Release Notes

Smart labels promoted to general release!

Smart labels, our feature to automatically label documents based on annotators and other metadata, has graduated from Technology Preview into General Availability. All its functionality is available by default on upgrades and new deployments.

Refreshed annotators

We have renamed rules as annotators as we felt it best fits what they do. Annotators also saw their create and edit forms simplified. Data X-Ray now supports regular expressions across multiple lines, which is practical for detecting data on CSV-like files.

Role based authentication for AWS S3 datasources

You could connect your AWS S3 datasources to Data X-Ray using the access keys of an IAM user. We are now enhancing the security of the connection by supporting a role based authentication via Instance Profiles. With this method, no access keys are needed making also to maintain than using static access keys.

Regulate file downloads on scans

The number of downloaded files that remain waiting for the next step on the document scan pipeline can be regulated. Prevent running out of space when scanning large datasources with complex file parsing.

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 7.3
June 12, 2024
Release Notes

Classification scan options to reduce false positives

This version of Data X-Ray comes with two enhancements to help reduce the number of false positives on data classification:

Credit card annotators have Luhn check validator active.

If you have activated any of the credit card annotators maintained by Data X-Ray on any datasource, your next classification scan may take longer. The reason for this is Data X-Ray needs to classify every document’s content with the updated version of the annotator.

Disable natural language annotators on non-natural language documents.

Data X-Ray has different types of annotators. One type is Natural language annotator and they employ Artificial Intelligence to identify data such as people’s names and surnames, company names, locations, etc.

Because those annotators were trained on documents that contained natural language in them – such as this description -, they have some difficulties in identifying correctly data on non-natural language documents – such as tabular data on CSV files.

Natural language annotators can now be disabled on non-natural language documents, decided by the document mime type. See Upgrade Instructions section to know how to enable this feature.

Scans got faster

Discovery scans got faster, as they now use a shorter version of the classification analytical pipeline.

We tuned the classification scans to make them more efficient. Several updated recommended settings are available in the Upgrade Instructions section.

More filters to search your data

Data X-Ray offers filters for detected or generated metadata by Data X-Ray, such as mime type, document language, or document status.

File size and document dates filters have been enhanced so that it is easier to list all documents bigger than some megas or older than some years.

Classic labels become standard labels

We have renamed classic labels as standard labels. We expect this naming change to help with onboarding users. There is no change in their functionality.

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 7.2
June 12, 2024
Release Notes

File Activity Monitoring

File Activity Monitoring is available in Technology Preview.

File Activity Monitoring (FAM) detects events that happen on monitored datasources, such as file updates or accesses. It captures information about what happened, such as what action was taken on the file and which user performed the action.

Data X-Ray captures those file events and enhances their information with file sensitivity, described with Data X-Ray labels. It then logs the information. This log can be consumed by SIEMs and can also be integrated into Imperva DSF to review trends and create alerts on files, events, sensitivity, and more.

Native support of Multi Factor Authentication (MFA)

Perviously, Data X-Ray offered Multi-Factor Authentication through Azure AD integration. 7.2 brings Multi-Factor Authentication to our default username and password login as well.

An administrator can require all users to use MFA when logging in to the app or to the console. Users will need to configure an authentication application by scanning a QR and providing a TOTP code. To enable this feature, follow the instructions on the technical release notes.

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 7.1
June 12, 2024
Release Notes

File Activity Monitoring (Partner Early Access)

We are making a Partner Early Access to our new File Activity Monitoring functionality.

File Activity Monitoring (FAM) detects events that happen on monitored files, such as file updates or accesses. It captures information about what happened, such as what action was taken on the file and which user performed the action.

Data X-Ray captures those file events and enhances their information with file sensitivity, described with Data X-Ray labels. It then logs the information. This log can be consumed by SIEMs and can also be integrated into Imperva DSF to review trends and create alerts on files, events, sensitivity, and more.

More dashboard metrics!

We have promoted the Data X-Ray general dashboard as a General Availability feature. In this release, we have updated our dashboards with three new metrics:

  • Excluded files: How many files have been excluded from classification, because they match the classification exclusion filters or because Data X-Ray could not extract its content, for example, due to password protection.
  • Data sources with classification enabled: Quickly know if all your data sources have the classification phase enabled so you can detect sensitive content on their files.
  • Data sources with label configuration: This graph tells you if your labels, including smart labels, are targeting all your data sources.

Detect duplicate files (Technology preview)

IT administrators waste millions every year storing redundant data like duplicate files. In this release, Data X-Ray can detect file duplicates based on content, same file size and file type.

It is currently an opt-in technology preview feature and it works with up to 10k files. Future release will increase the maximum number of files.

ChatGPT summaries on demand (Technology preview)

When data privacy professionals drill down on a file with sensitive data in it, they first need more context around the sensitive data hits and a better understanding about the file’s contents. Fortunately, Large Language Models (LLMs), like those used by ChatGPT and LLaMA have proven to be incredible at synthesizing massive amounts of unstructured data. For Data X-Ray Cloud users, and those that have access to OpenAI’s APIs, we have added document summaries via ChatGPT. Just click the summarize button and let AI do the work!

Stay tuned for more AI features coming soon!

File entitlements on SMB connectors

Two new metadata fields are available on SMB connectors that relate to file entitlements:

  • FILE_ENTITLEMENTS_USERS_AND_GROUPS_ALLOWED_READ_ACCESS : A list of users and groups that have read access on the file.
  • FILE_ENTITLEMENTS_USERS_AND_GROUPS_DENIED_READ_ACCESS: A list of users and groups that are specifically denied read access on the file. \

With these metadata you can easily search for all the files that a user can access to, directly on Data X-Ray UI and they are also available on csv exports.

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 7.0
June 12, 2024
Release Notes

Performance improvements when scanning thousands of datasources

Some customers need to scan and index hundreds or thousands of datasources. Those numbers can be reached when, for example, a customer needs to scan their employees’ oneDrives or SharePoints. In previous releases, we have been improving the performance of our APIs to better support this number of datasources. We now bring an improvement on the data layer, about how we store the file and datasource information in the Data X-Ray index.

Overview dashboard, our new welcome page

Today we open our new welcoming page! From there you can know at a glance how many files Data X-Ray is managing and how all your datasources are configured. Dealing with many datasources, it is easy to have some configuration gaps. This page will help identify any configuration gaps and files that could not be classified.

This feature is under Technology Preview and it needs to be activated when upgrading Data X-Ray.

We capture file owner and group information from files in SMB shares

We have improved our SMB connector to capture the file owner and group. The new metadata information is OWNER_USER and OWNER_GROUP. Those metadata are available for on-prem file servers. Azure File Storage not supported yet.

We read Site Id, Site URL on OneDrives and Sharepoint (Graph API) datasources

New OneDrive or Sharepoint Online datasources added to Data X-Ray will fetch the information of its site id and site url and display it on the About datasource widget. We forward this information to Imperva DSF with the file findings.

We show better messages when a file could not be classified

Sometimes files cannot be classified. There are many possible reasons for this: from being filtered out on the datasource settings to the file being corrupt or password protected. With the improved messages you are better informed of the reasons why a file could not have its content analyzed so you may want to act on it.

You don’t need to select a datasource when searching for metadata

On the search page there is a metadata filter that allows you to search by any metadata field. In previous versions, you had to select the datasource you would like to search by. This was problematic for two reasons: a) it didn’t support multiselect, b) there is an external datasource selector that may make your search invalid. We are now removing the need to select a datasource on the metadata filter. If you want to narrow down your search by datasource, use the general datasource filter that supports multiselection. And voilà!

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 6.15
June 12, 2024
Release Notes

If you are interested in knowing the stale data in your datasource first and analyzing the sensitivity of the content in later scans, you can do it much faster now

Sometimes our customers would like to do a first scan of their data to identify stale, obsolete or trivial data that can be detected through file metadata but previously they need to go through the process of discovery and classification, taking the whole process much longer than needed.

We are taking this opportunity to be able to configure per datasource if we want a discovery or discovery and classification scan and also revamp the datasource setting page. We are upgrading the setting page to make each section more easy to reach. All these sections are readable by L1 – L3 users, and only L4 users can update the settings.

Under the General section we can update the datasource name and its connection details, as well as removing all its data. In the Access section we can configure different access levels to different users.

In the Discovery section we configure and start a scan of the datasource. Discovery data that is fetched on that part of the process is metadata about the file such as filename, path or last modified date.

Classification section is where we can configure how we want to classify the content of the files; main configuration is what rules we want to identify, but also what exclusion filters we want to apply in order to skip downloading and analyzing the content of the files. Integration section is to configure Collibra integration. Exports section is where we can export a full or anonymized version of the content of the datasource (if we have at least L3 permission)

⚠️We have moved the thread scheduler section to the console as it is a global setting, users configuring the thread scheduler should have the new RBAC Download Thread Scheduler applied (this role is not part of the Organizational Admin role)

⚠️We are deprecating the mimetype exclusion filter, users should use the file extension exclusion filter instead.

Preserving last time access in SMB datasources

When Data X-Ray’s SMB connector connects to a network drive and starts downloading files, it can potentially (depending on the datasource server’s settings) alter the date accessed of a file. Preserving the last access time is key to help you understand what files haven’t been accessed by your users recently. With that in mind, our SMB connector has been enhanced to preserve the date last accessed of a file. When creating a new SMB connection you can modify the datasource settings to:

  • Classify all files without resetting last accessed date (allows for faster scanning)
  • Classify all files and preserve last accessed date on a file when possible (when write permission to the files are available)

If preserve last access time is configured in the datasource settings, but Data X-Ray has no write permissions to the file, we don’t download nor classify the document in the first place, to avoid having the last access time updated to the time Data X-Ray scanned it, and then preventing losing when that file was last accessed by a user.

Datasource connection details are now fully editable

Under the new datasource setting page general section, we have made all our datasource connection details editable. Sometimes you need to update a user or password due to password rotation policies, now you can easily update them in Data X-Ray. We have also made the name of the datasource editable, no more typos on the datasource name!

Datasource detection rules can be enabled or disabled in bulk by type

When you are configuring the detection rules on a datasource, you could choose to enable or disable each one of the rules or enable or disable them all. Now Data X-Ray offers more flexibility, allowing you to filter the rules by name or type and enabling or disabling the search.

Smart label queries now support all your datasources!

Smart labels are still in Technology Preview and we have enhanced their queries to support one, two or all of your datasources at once. When creating or updating a smart label, we can choose from all our datasources that are scanned or scanning – so fresh-created datasources that are not yet scanned will not appear on the datasource selector.

We can also select from all the metadata we know of and all the rules we have in Data X-Ray, regardless of what datasources are selected.

It is now possible to see what datasources each label is targeting, or can be applied to. Classic labels can be applied to all datasources. Smart labels can be applied to the datasources they are selected in any of their queries. This is a quick view to know if you have all your (smart) labels correctly configured covering all the datasources you care about.

Improved Imperva DSF Integration

Previous Imperva DSF integration scanned the whole datasources as a separate process from Data X-Ray. That means that a user using both Data X-Ray and Imperva DSF to see the files results would need to scan every datasource twice. Moreover, Imperva DSF integration did not have the concept of smart scans on deltas, so it would scan everything again every 24h after the last scan. We have now developed a new endpoint that Imperva DSF can consume, that instead of doing a separated scan of the datasource it reads directly from Data X-Ray scans. Data flows faster towards Imperva DSF.

⚠️The current solution works when the datasources in Data X-Ray don’t have discovery monitoring enabled. We have another roadmap initiative to make this feature compatible with discovery monitoring.

Performance dashboards

We have added telemetry to our different modules in our tech stack and now the system health is visible in performance dashboards. Target audience for this are system admins and our customer success team.

Those dashboards give us information about how elasticsearch and other parts of the application are performing and where (if any) some modules are being stressed too much, so customer success can identify bottlenecks in the running application.

Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 6.11.4
June 12, 2024
Release Notes
  • Luhn Check support on regular expression rules
  • CSV Export enhancements
  • Date metadata can be filtered with open and close date ranges
  • Datasource index status shows now current status
  • Updated widget for month / year selector when selecting dates

Technical notes:

  • Several updates to the configuration files
  • Clean up existing Smart Labels – If you are already using Smart Labels, you will need to delete existing Smart Labels (steps provided) and create them again (manually)
Compatibility
  • AWS
  • GCP
  • Kubernetes
  • ARO
  • Azure
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Podman
License and Usage Requirements
Version 4.0.0
June 12, 2024
Release Notes
  • More export options and reporting features around exporting
  • Export of entire datasources
  • More control of auto-redaction features for unstructured data for better accuracy
  • General bug fixes
  • Several performance improvements in datasource crawling
  • Deprecation of casefile support for structured databases (may return in future release)
Compatibility
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • Collibra API v2
License and Usage Requirements
Version 3.12.6
June 12, 2024
Release Notes

Features

  • Dashboards generated on an ongoing basis as the file system changes instead of needing to be refreshed
  • Many new redaction export options including 2 pass filters and enhanced name recognition
  • More tagging available in search screen
  • API support for programmatic file and text extraction
Compatibility
  • Data X-Ray 3.12.6
  • Collibra Data Intelligence Cloud
  • Collibra Data Intelligence On-Prem
Dependency
  • N/A
License and Usage Requirements

Support is provided via email ([email protected]), phone (+1 917 633 7719 or ‭+44 330 222 0016), and an in app chat box.

  • Basic support included (8:00-17:00 UK business hours)
  • Premium support available at additional cost

Licensed according to number of accounts identified in the Directory Management Service Tenant as a human user (service accounts are excluded). This is queried directly in the directory management service. Please contact [email protected]

Reviews

Rating
Leave a review