Technology
Concrete tools, systems, or platforms with version histories and maintainers.
72 nodesAmazon's fully managed object storage service — the origin and reference implementation of the S3 API. As of December 2025, the ma…
An open-source, S3-compatible object storage server designed for high performance and self-hosted deployment. As of February 2026,…
A distributed storage system providing object, block, and file storage in a unified platform. S3 compatibility via its RADOS Gatew…
A scalable, distributed object storage system in the Hadoop ecosystem with an S3-compatible interface.
An open table format for large analytic datasets. Manages metadata, snapshots, and schema evolution for collections of data files …
An open table format and storage layer providing ACID transactions, scalable metadata, and schema enforcement on data stored in ob…
A table format and data management framework optimized for incremental data processing — upserts, deletes, and change data capture…
A lakehouse metadata format that stores table metadata in an embedded SQL database (DuckDB) instead of file-based manifests on S3.…
An in-process analytical database engine (like SQLite for analytics) that reads Parquet, Iceberg, and other formats directly from …
A distributed SQL query engine for federated analytics across heterogeneous data sources, with deep support for S3-backed data lak…
A column-oriented DBMS designed for real-time analytical queries, with native support for reading from and writing to S3.
A distributed compute engine for large-scale data processing — batch ETL, streaming, SQL, and machine learning — over S3-stored da…
A vector database that stores data in the Lance columnar format directly on object storage. Designed for serverless vector search …
An open-source vector database with hybrid search combining BM25 keyword matching and vector similarity in a single query, plus mu…
A Rust-based vector search engine with native payload filtering and a custom HNSW index implementation that applies metadata filte…
A distributed vector database built for billion-scale similarity search, using a microservices architecture with SSD caching for h…
An MPP analytical database with native lakehouse capabilities, able to directly query S3 data in Parquet, ORC, and Iceberg formats…
A distributed stream processing framework that processes data in real-time, with S3 as checkpoint store, state backend, and output…
An AWS S3 storage class delivering single-digit millisecond latency for frequently accessed data. Uses directory buckets in a sing…
An AWS-managed feature providing native Apache Iceberg tables as a built-in S3 capability, with automated compaction, snapshot man…
A native vector storage and search capability built into S3, enabling storage and querying of embeddings directly in S3 without a …
An AWS feature that automatically generates queryable metadata tables (in Apache Iceberg format) over S3 objects, enabling SQL-bas…
An open-source distributed storage system with an S3-compatible API, architecturally optimized for billions of small and large fil…
An S3-compatible object storage service from Cloudflare with zero egress fees, integrated with the Cloudflare global edge network.
A low-cost S3-compatible cloud storage service with free egress to CDN partners through the Bandwidth Alliance, designed for cost-…
An S3-compatible cloud storage service with a fixed pricing model — no egress fees, no API request fees, approximately $5–7/TB/mon…
A disaggregated all-flash data platform providing unified access via S3, NFS, and SMB protocols, optimized for AI and deep learnin…
An enterprise-grade software-defined object storage platform from Dell with S3-compatible API, designed for on-premise and hybrid …
A software-defined S3-compatible object storage system with policy-driven information lifecycle management (ILM), designed for ent…
An all-flash unified file and object storage platform from Pure Storage with S3-compatible API, designed for AI, analytics, and mo…
A lightweight, self-hosted, geo-distributed S3-compatible object storage system designed for small distributed clusters, edge depl…
A unified data access layer providing a single API for accessing 40+ storage backends including S3, GCS, Azure Blob, HDFS, and loc…
A Git-like version control system for data lakes on S3, providing branching, committing, merging, and rollback for datasets stored…
A Kubernetes storage orchestrator that deploys and manages Ceph clusters on Kubernetes, providing K8s-native S3-compatible object …
A high-performance FUSE-based filesystem that provides POSIX-compatible access to S3-compatible object storage, optimized for AI/M…
A POSIX-compliant distributed filesystem that uses S3-compatible object storage as its data backend and a separate metadata engine…
An open-source REST catalog for Apache Iceberg with centralized RBAC, originally developed by Snowflake and donated to Apache.
A unified metadata lake — "catalog of catalogs" — that federates Iceberg, Hive, Kafka, and file-based data sources into a single g…
An open-source, multi-format data catalog by Databricks (Linux Foundation), supporting Iceberg, Delta Lake, Hudi, and unstructured…
A zero-copy metadata translator (Apache incubating, formerly OneTable) that converts between Iceberg, Delta Lake, and Hudi metadat…
A Delta Lake feature that automatically generates Iceberg and Hudi metadata for Delta tables, enabling cross-format reads without …
An Apache top-level streaming lakehouse table format built on LSM-tree architecture, designed for high-frequency real-time writes …
Apache Flink connectors for reading database change logs (MySQL binlog, PostgreSQL WAL) and streaming them directly into lakehouse…
A managed real-time data integration platform with exactly-once connectors for streaming data from databases and SaaS APIs into S3…
A Python-native stream processing framework built on a Rust-based Timely Dataflow engine, designed for real-time data transformati…
A platform for programmatically authoring, scheduling, and monitoring workflows as directed acyclic graphs (DAGs) written in Pytho…
A high-performance, Rust-based, S3-compatible object storage server positioned as a truly open-source alternative to MinIO.
The reference implementation for OpenLineage — an open-source metadata and lineage service with a web UI for visualizing data flow…
A framework for fine-grained security and centralized auditing across the Hadoop and lakehouse ecosystem, providing column-level a…
An S3 feature that reduces KMS API calls by up to 99% by caching encryption key material at the bucket level rather than making in…
A stateless, S3-native data streaming platform with Kafka protocol compatibility. No local disks, no brokers to manage — all data …
A real-time analytical database with native lakehouse capabilities, querying Iceberg, Hudi, and Paimon tables on S3 directly. Late…
An enterprise storage platform with S3-compatible object storage, delivering hardware-defined performance guarantees at petabyte s…
A purpose-built, hardware-defined storage appliance providing S3-compatible object storage on Ceph with auditable supply-chain man…
AWS's fully managed metadata catalog service that stores table definitions, partition information, and schema metadata for data st…
The original metadata catalog service from the Apache Hive project that stores table schemas, partition mappings, and storage loca…
A lakehouse query engine that provides SQL analytics directly on S3-stored data with integrated Iceberg table management, data ref…
AWS's serverless, pay-per-query SQL engine that runs queries directly against data stored in S3 without requiring infrastructure p…
An open-source distributed platform for change data capture (CDC) that streams row-level changes from databases (PostgreSQL, MySQL…
An extensible query execution framework written in Rust, built on Apache Arrow, that provides a SQL query planner and execution en…
A high-performance DataFrame library written in Rust with Python and Node.js bindings, designed for fast columnar analytics with l…
An Apache Kafka feature (KIP-405) that offloads older log segments from broker-local disks to S3-compatible object storage, extend…
A Kafka-compatible streaming platform written in C++ that provides a single binary deployment with built-in Tiered Storage to S3, …
An open-source transactional catalog for data lakes that provides Git-like branching, tagging, and commit semantics for Iceberg ta…
An open-source data integration platform that provides pre-built connectors for extracting data from hundreds of sources (APIs, da…
Apache Spark's stream processing API that enables continuous, micro-batch, or near-real-time ingestion of data streams into S3-bac…
A C++ vectorized execution engine developed by Meta that provides a unified, high-performance data processing backend usable by mu…
A Python library for declarative data loading (data load tool) that simplifies building data pipelines to extract from APIs and lo…
An open-source metadata platform providing a centralized catalog for data discovery, quality, lineage, and governance across S3-ba…
An open-source metadata platform originally developed at LinkedIn that provides data discovery, lineage tracking, governance, and …
An open-source metadata management and governance framework originally built for the Hadoop ecosystem, providing classification, l…
An S3-compatible, globally distributed object storage platform engineered to optimize small-object workloads through metadata inli…