Guide 22

Table Format Interoperability — XTable, UniForm, and the End of Format Lock-In

Problem Framing

Organizations that use multiple table formats — Iceberg for analytics, Delta for Spark-native pipelines, Hudi for CDC — face a metadata fragmentation problem. Each format maintains its own transaction log, manifest structure, and partition scheme. Apache XTable and Delta UniForm enable omni-directional metadata translation, allowing a table written in one format to be read as another without duplicating the underlying Parquet data files. Engineers need to understand which features survive translation, which are lost, and how to configure cross-format publishing in production.

Relevant Nodes

  • Topics: S3, Table Formats
  • Technologies: Apache XTable, Delta UniForm, Apache Iceberg, Delta Lake, Apache Hudi
  • Architectures: Interoperability Patterns
  • Pain Points: Vendor Lock-In

Decision Path

  1. Map your current format landscape. Inventory which tables use which formats and which engines read them. The interoperability strategy depends on whether you need one-directional reads (e.g., Trino reading Delta tables as Iceberg) or bidirectional writes.

    • Most organizations have a primary format and need read-only access from engines that support a different format.
  2. Choose your translation direction. XTable supports Iceberg-to-Delta, Delta-to-Iceberg, Hudi-to-Iceberg, and other combinations. UniForm is Delta-native and generates Iceberg metadata alongside Delta commits, making Delta tables readable by Iceberg engines.

    • If your primary format is Delta, UniForm is simpler — it runs inline with Delta commits.
    • If you have mixed formats or need Hudi translation, XTable provides broader coverage.
  3. Configure XTable or UniForm. XTable runs as a post-commit process that reads one format's metadata and generates equivalent metadata for the target format. UniForm is configured as a Delta table property and generates Iceberg metadata automatically on each commit.

    • XTable requires a scheduled or event-triggered sync job.
    • UniForm adds latency to Delta commits (typically milliseconds for metadata generation).
  4. Understand feature loss in translation. Not all features translate cleanly:

    • Iceberg hidden partitioning does not map to Hive-style partitioning used by Delta and Hudi.
    • Deletion vectors in Iceberg v3 may not be representable in Delta's delete file format, or vice versa.
    • Time travel semantics differ — snapshot IDs are not preserved across formats.
    • Schema evolution may behave differently (column renaming, type widening rules vary).
  5. Test query performance across translated views. Translated metadata may not include all statistics (column min/max, null counts) that the target engine uses for pruning. Benchmark query performance on translated tables against native tables to quantify any pruning degradation.

  6. Design ingestion pipelines with cross-publishing. For new tables, decide at write time whether to enable cross-format metadata generation. This avoids retrofitting interoperability onto existing tables.

    • UniForm: set the table property at table creation.
    • XTable: integrate the sync job into your data pipeline orchestration.

What Changed Over Time

  • Before 2023, format lock-in was accepted as the cost of choosing an ecosystem. Iceberg users could not read Delta tables and vice versa without full data conversion.
  • Delta UniForm (2023) was the first production-grade interoperability mechanism, reflecting Databricks' strategic decision to support Iceberg readers without abandoning Delta as the write format.
  • Apache XTable (incubating, originally OneTable from Onehouse) generalized the approach to support any-to-any format translation, including Hudi.
  • As of 2026, interoperability is metadata-only — the underlying Parquet files are shared, but each format maintains its own transaction log. True format unification remains unrealized.

Sources