Guide 36

POSIX Compatibility on Object Storage — When You Need a Filesystem Over S3

Problem Framing

S3 is a key-value store with HTTP semantics, not a filesystem. It lacks atomic rename, returns directory listings via paginated API calls, and requires HTTP range requests for random reads. Most modern data tools — DuckDB, LanceDB, Spark, Trino — speak S3 natively and don't need filesystem semantics. But ML training frameworks (PyTorch DataLoader), legacy analytics tools, and POSIX-dependent applications assume they can open, seek, rename, and list files the way a local filesystem works.

JuiceFS, GeeseFS, and FUSE-based mounts bridge this gap by presenting S3 data as a mounted filesystem. But each takes a different approach: JuiceFS splits files into chunks with external metadata, GeeseFS maps S3 objects 1:1 to files with aggressive caching, and AWS Mountpoint provides read-optimized access to existing S3 buckets. Choosing the wrong bridge adds latency, operational complexity, or both.

Relevant Nodes

  • Topics: Object Storage, S3
  • Technologies: JuiceFS, GeeseFS, SeaweedFS, MinIO, Garage
  • Standards: S3 API
  • Architectures: Separation of Storage and Compute
  • Pain Points: Lack of Atomic Rename, Directory Namespace / Listing Bottlenecks

Decision Path

  1. Do you actually need POSIX? Before adding a filesystem layer, check whether your tools support S3 natively. DuckDB reads S3 via httpfs. LanceDB writes indexes directly to S3. Spark and Trino use S3 connectors. If every tool in your pipeline speaks S3, a POSIX bridge adds complexity for no benefit.

  2. Read-only mount or full read-write? If you only need to read existing S3 data as files (e.g., feeding training data to PyTorch), GeeseFS or AWS Mountpoint for S3 provide lightweight read-optimized FUSE mounts with minimal overhead. If you need full POSIX read-write semantics including atomic rename, JuiceFS is the only option that implements these operations correctly.

  3. Can you accept a metadata engine dependency? JuiceFS requires an external metadata store — Redis for performance, PostgreSQL for durability, or TiKV for scale. This is a stateful component that must be backed up and monitored. GeeseFS and Mountpoint are stateless — they translate S3 operations directly.

  4. What's the I/O access pattern? Large sequential reads (training data, log processing) work well with any FUSE mount — the overhead is amortized across large transfers. Small random reads (database-style lookups, frequent seeks) suffer significant latency over S3 HTTP round-trips. JuiceFS mitigates this with local chunk caching, but the latency gap versus a real filesystem remains.

What Changed Over Time

  • JuiceFS matured its S3 gateway mode, allowing applications to access JuiceFS data via S3 API without FUSE — bridging both directions.
  • GeeseFS emerged from Yandex Cloud as a performant read-optimized FUSE mount specifically for ML training workloads on S3.
  • AWS released Mountpoint for Amazon S3 as a first-party FUSE client, but limited to read-heavy and sequential write workloads — no random writes or rename.
  • The trend toward S3-native tools (DuckDB httpfs, LanceDB on S3, Spark S3A) reduced the need for POSIX bridges in most modern data architectures.

Sources