Technology

GeeseFS

A high-performance FUSE-based filesystem that provides POSIX-compatible access to S3-compatible object storage, optimized for AI/ML training data loading.

4 connections 2 resources

Summary

What it is

A high-performance FUSE-based filesystem that provides POSIX-compatible access to S3-compatible object storage, optimized for AI/ML training data loading.

Where it fits

GeeseFS solves the impedance mismatch between ML frameworks that expect POSIX file access and training data stored in S3. It mounts S3 buckets as local directories, using aggressive caching and read-ahead to minimize the FUSE performance penalty.

Misconceptions / Traps
  • FUSE performance is inherently limited by kernel context switches. GeeseFS mitigates this with aggressive caching and read-ahead, but cannot match native filesystem performance for metadata-heavy operations.
  • Write performance through FUSE to S3 is significantly slower than reads. GeeseFS is optimized for read-heavy ML training workloads, not write-heavy ingestion.
Key Connections
  • depends_on S3 API — mounts S3 buckets via FUSE
  • scoped_to Object Storage for AI Data Pipelines — POSIX access layer for ML workloads

Definition

What it is

A high-performance FUSE-based filesystem that presents S3-compatible storage as a POSIX-mountable filesystem, optimized for AI/ML training workloads that require filesystem semantics over S3 data.

Why it exists

Many ML frameworks expect POSIX file paths, not S3 URIs. GeeseFS provides a high-throughput FUSE mount with aggressive caching and prefetching, enabling these frameworks to read S3 data without code changes.

Primary use cases

POSIX access to S3 for ML training, legacy application migration to S3-backed storage, dataset browsing with filesystem tools.

Connections 4

Outbound 4

Resources 2