Simplifying Data Compatibility with Universal Format: Introduction to Data Lakes

Data Lakes: Universal format is intended to simplify data compatibility

Databricks, known for its Lakehouse architecture, has announced the release of Delta Lake 3.0 (RC1). This open source project for data lakes has been under the control of the Linux Foundation for about a year.

The most important new features in Version 3.0 include a universal format (UniForm) designed for greater data compatibility and a more flexible process for clustering data. The changes introduced by the Delta Lake development team are primarily intended to help users with easier integration and higher performance for central data storage and use.

The new universal format, UniForm, automatically generates suitable metadata for the Apache Iceberg and Apache Hudi formats for the data stored in Delta Lake. This means that when reading the data, they can be treated as if they were stored in Iceberg or Hudi. The restriction to individual data formats or manual conversion is no longer necessary.

Furthermore, the new Delta Liquid Clustering feature promises to end Hive-style table partitioning. This rigid data layout is replaced by a more flexible clustering process, which ensures higher read and write performance, especially in the case of rapidly growing databases, while also contributing to lower costs.

Developers benefit from stabilized APIs, thanks to the updated kernel. The adjustments to the connectors for Delta Lake that were previously necessary after updates or protocol changes can now be omitted. This helps counteract the increasing fragmentation of the connectors and ensures that users can enjoy new functions in the data lake more quickly.

Delta Lake 3.0 is now available as a preview version (Release Candidate 1) on GitHub. For more information, visit the project’s GitHub repo and the Delta.io website.

Leave a Reply