Technology

JuiceFS

A POSIX-compliant distributed filesystem that uses S3-compatible object storage as its data backend and a separate metadata engine (Redis, PostgreSQL, or TiKV) for file metadata.

4 connections 2 resources 2 posts

Summary

What it is

A POSIX-compliant distributed filesystem that uses S3-compatible object storage as its data backend and a separate metadata engine (Redis, PostgreSQL, or TiKV) for file metadata.

Where it fits

JuiceFS bridges the gap between applications that expect POSIX filesystem semantics and data stored on S3. It enables ML training frameworks, legacy applications, and POSIX-dependent tools to access S3 data as a mounted filesystem — without rewriting code to use S3 APIs directly. Unlike simple FUSE mounts (GeeseFS, Mountpoint for S3), JuiceFS splits files into chunks stored as S3 objects with external metadata management.

Misconceptions / Traps
  • JuiceFS adds a metadata engine dependency (Redis, PostgreSQL, or TiKV). This is a stateful component that must be backed up, scaled, and monitored — it is not purely serverless. The metadata engine is now your most important dependency, not S3. A Redis outage takes the whole filesystem offline even if S3 is healthy.
  • POSIX compliance over S3 introduces latency overhead. Random reads and small file operations are slower than native filesystem access due to S3 round-trips for data chunks.
  • JuiceFS is not a FUSE mount that maps S3 objects 1:1 to files. It splits files into chunks (default 64 MB) stored as S3 objects with separate metadata, which is a fundamentally different architecture. Object storage admins reading bucket contents directly will see opaque chunks, not files.
  • Metadata engine choice is the single biggest deployment decision. Redis = best raw IOPS and simplest ops, but single-node throughput limit. TiKV = horizontally distributed for billion-file scale (Bytedance, Xiaohongshu reference deployments) but operational complexity is real. PostgreSQL = SQL-grade durability and easiest ops if you already run managed Postgres, but lower throughput than Redis or TiKV.
  • Architecturally similar to Amazon S3 Files (April 2026, AWS-only) — JuiceFS is the vendor-neutral, self-hostable equivalent. The choice is "AWS managed and convenient" vs "any-cloud and operationally explicit."

Architecture posture: Files split into chunks (default 64 MB, content-addressable inside) → uploaded as immutable S3 objects → metadata engine stores tree + chunk-mapping + locks + permissions. Cache tiers: kernel page cache → local NVMe disk → S3. Hot reads at NVMe-class latency; cold reads pay S3 round-trip.

Where it fits in the stack: the POSIX bridge for ML training (PyTorch DataLoader, NVIDIA DALI), shared K8s filesystem (CSI driver), legacy app integration with object storage, multi-region distributed compute over a single shared namespace. Pair with a cache tier (Alluxio or local-NVMe) for AI training; pair with high-IOPS Redis or TiKV when metadata operations dominate.

Key Connections
  • depends_on S3 — uses S3 as the data storage backend
  • depends_on Redis / TiKV / PostgreSQL — the metadata engine is a hard dependency
  • solves Lack of Atomic Rename — atomic rename implemented in the metadata engine
  • solves Cold Scan Latency — local NVMe cache layer
  • scoped_to Object Storage — bridges POSIX and S3 semantics
  • alternative_to Amazon S3 Files — vendor-neutral, self-hostable equivalent

Definition

What it is

A POSIX-compatible distributed filesystem that uses S3-compatible object storage as its data layer and a separate metadata engine (Redis, TiKV, or PostgreSQL) for file metadata. Bridges traditional filesystem semantics and object storage economics. The metadata-engine choice is the single biggest deployment decision: **Redis** for high-throughput single-node setups (best raw IOPS, no clustering), **TiKV** for horizontally distributed metadata at billion-file scale (production reference: Bytedance, Xiaohongshu), or **PostgreSQL** when an existing managed Postgres simplifies ops.

Why it exists

Many workloads — ML training frameworks, legacy analytics tools, POSIX-dependent applications — require filesystem semantics (atomic rename, directory listing, random reads) that S3 does not natively provide. JuiceFS layers these semantics on top of S3 without requiring application rewrites. Architecturally similar to **Amazon S3 Files** (released April 2026, AWS-only) but vendor-neutral and self-hostable against any S3-compatible backend.

Primary use cases

POSIX filesystem access to S3-backed data for ML training (PyTorch DataLoader, NVIDIA DALI), shared filesystem for Kubernetes pods backed by S3 (CSI driver available), legacy application integration with object storage, multi-region distributed compute reading from a single shared namespace, lakehouse scratch tier with NVMe-class metadata operations.

Recent developments

Latest signals
  • JuiceFS 2025 recap — 590k+ filesystems, 1.3 EiB stored, 400B+ files. Per JuiceFS's 2025 recap blog post, the Community Edition crossed 590,000+ file systems (82% YoY increase), 150,000+ active clients, 400+ billion files, and 1.3+ EiB data volume (89% increase). Enterprise Edition introduced RDMA support during 2025 for AI training workloads where the kernel TCP/IP path becomes the bottleneck. The growth shape — file count growing faster than client count — reflects the AI-training-data use case where individual jobs touch tens of millions of files.
  • Active release cadence: 5.3.5 → 5.3.7 in Q2 2026. Per the JuiceFS Cloud release notes, the 5.3.x line continues to ship incremental improvements: 5.3.7 adds the find subcommand and access-log option plus improved Ceph listing and jfs:// protocol sync; 5.3.6 adds the attr subcommand and reduces memory usage; 5.3.5 adds mount-timeout handling. The cadence is dense (3 patch releases in roughly a quarter), signaling a maintained-and-active project rather than one in stasis.

Connections 4

Outbound 3
scoped_to1
depends_on1
Inbound 1
alternative_to1

Resources 2

Featured in