Databricks open sources Unity Catalog: Will it usher in a new era for data and AI? - Impetus

Databricks open sources Unity Catalog: Will it usher in a new era for data and AI?

June 2024

On June 12, 2024, Databricks made waves by open-sourcing Unity Catalog, now free for all. This powerful tool uniquely governs data and AI across clouds, data types, and platforms. By embracing open systems, Databricks empowers customers to avoid vendor lock-in and take control of their data future.

As Databricks lead the charge for open data and AI catalog standards, this blog will explore how this move, supported by big players like Amazon Web Services, Google Cloud, Microsoft, NVIDIA, and Salesforce, will reshape data and AI for enterprises.

Unity Catalog Open Source Software (OSS offers a universal interface, compatible with any data format and compute engine. Users can access tables through Delta Lake UniForm using Delta Lake, Apache Iceberg™, and Apache Hudi™ clients. Additionally, it supports:

  • Iceberg REST Catalog, and
  • Hive Metastore (HMS) interface standards

Unity Catalog OSS also brings unified governance to tabular and non-tabular data, AI assets like ML models and GenAI tools, helping organizations ensure data governance at scale.

Unity Catalog Open Source: Core features and new enhancements

Unity Catalog OSS continues to evolve with new enhancements while maintaining its foundational features. Here’s an overview of its core features and the latest additions.

  • Open-source API implementation: Based on the OpenAPI specification, Unity Catalog OSS is open-sourced under Apache 2.0 and is compatible with the Apache Hive metastore API and Apache Iceberg’s REST catalog API
  • Multi-format and multi-engine support: Unity Catalog OSS supports many data formats, including Delta Lake, Apache Iceberg via UniForm, Apache Parquet, CSV, JSON, and more. Its open APIs ensure seamless access by virtually any compute engine
  • Multimodal and unified management: Supports diverse data and AI assets, including tables, files, functions, and AI models, all managed in one place within the Unity Catalog
  • Vibrant ecosystem: Community-driven with support from AWS, Microsoft Azure, Google Cloud, NVIDIA, Salesforce, DuckDB, LangChain, dbt Labs, Fivetran, Confluent, Unstructured, Onehouse, Immuta, Informatica, and many more
  • Iceberg REST Catalog API: Implements the Iceberg REST Catalog API, facilitating easy access from the Iceberg Engine ecosystem and leveraging experience from Tabular
  • Credential vending: Controls client access to the underlying cloud storage based on credentials, ensuring centralized governance in the catalog server

By integrating these core features and new enhancements, Unity Catalog OSS provides a robust, extensible, and community-supported platform for managing diverse data and AI assets, making it a vital tool for modern data ecosystems.

Why multimodal data and AI catalog? Because single-mode governance is dead!

In the ever-accelerating world of data and AI, governance can’t afford to be one-dimensional. Enter Unity Catalog OSS, the revolutionary multimodal data and AI governance tool from Databricks. As AI technologies rapidly infiltrate mainstream applications, the need for a unified approach to managing structured and unstructured data has never been clearer.

Anticipating this need, Databricks launched the Unity Catalog three years ago to offer a consistent data and AI governance model. Today, thousands of customers rely on its robust features to streamline their operations:

Single namespace: Smash the silos. Unite tables, unstructured data, and AI assets under one cohesive namespace

Centralized audit: Total control. Maintain comprehensive logs of all data and AI activities, ensuring transparency and compliance

Unified lineage: Connect the dots. Achieve complete lineage across data and AI workloads, simplifying asset tracking and management.

Cross-Organization collaboration: Break boundaries. The open-source Delta Sharing protocol enables seamless collaboration across organizations, driving unprecedented innovation and efficiency

Is Unity Catalog OSS here to stay?

It’s evident that Unity Catalog OSS isn’t just a passing trend—it’s a transformative leap in data and AI governance. Here’s why this innovation is here to stay:

  • Open source at the core: Continuous updates and global adoption ensure Unity Catalog OSS remains at the forefront of technological advancements and adapts to users’ evolving needs.
  • Interoperability: Supporting various data formats and computing engines frees organizations from vendor lock-in, promoting true flexibility and choice.
  • Unified governance: Managing structured data, unstructured data, and AI assets under a single model streamlines operations and boosts efficiency, making governance comprehensive and straightforward.
  • Widely supported ecosystem: With robust support from industry leaders like AWS, Azure, and Google Cloud, Unity Catalog OSS is firmly entrenched as a critical component in the data and AI landscape.
  • API design principles: Its API-driven architecture ensures seamless integration and scalability, ready to accommodate future technological evolutions.
  • Established adoption: Thousands of customers benefit from its strong governance capabilities, underscoring its reliability and practical value.

Additionally, Impetus’ Unity Catalog Migration Accelerator further enhances the ecosystem, providing a streamlined transition to Unity Catalog, ensuring organizations can adopt this groundbreaking tool with minimal disruption and maximum efficiency.

Unity Catalog OSS’ open-source foundation, extensive interoperability, unified governance, broad industry support, and proven track record, complemented by the UCMA, position it as a lasting innovation poised to shape the future of data and AI governance.

Authors

Atharv Sakalley

Atharv is an analytics engineer with four years of experience in the healthcare industry. He specializes in machine learning, deep learning, and natural language processing, leveraging AI to improve business outcomes. He is proficient in data extraction and analysis and deploying scalable systems. Atharv has demonstrated expertise in developing AI chatbots and working across multiple domains.

Learn more about how our work can support your enterprise