Technology
Concrete tools, systems, or platforms with version histories and maintainers.
114 nodesAmazon's fully managed object storage service — the origin and reference implementation of the S3 API. As of December 2025, the ma…
An open-source, S3-compatible object storage server designed for high performance and self-hosted deployment. As of February 2026,…
A community-maintained AGPL v3 fork of MinIO created after the upstream repository was archived in February 2026 and permanently r…
An open-source (Apache 2.0) S3-compatible gateway that translates S3 API calls into POSIX filesystem operations. A thin translatio…
A distributed storage system providing object, block, and file storage in a unified platform. S3 compatibility via its RADOS Gatew…
A scalable, distributed object storage system in the Hadoop ecosystem with an S3-compatible interface.
An open table format for large analytic datasets. Manages metadata, snapshots, and schema evolution for collections of data files …
An open table format and storage layer providing ACID transactions, scalable metadata, and schema enforcement on data stored in ob…
A table format and data management framework optimized for incremental data processing — upserts, deletes, and change data capture…
A lakehouse metadata format that stores table metadata in an embedded SQL database (DuckDB) instead of file-based manifests on S3.…
An in-process analytical database engine (like SQLite for analytics) that reads Parquet, Iceberg, and other formats directly from …
A federated AI/data runtime that combines embedded DuckDB compute with native delegation to Amazon S3 Vectors for similarity searc…
A distributed SQL query engine for federated analytics across heterogeneous data sources, with deep support for S3-backed data lak…
A column-oriented DBMS designed for real-time analytical queries, with native support for reading from and writing to S3.
A distributed compute engine for large-scale data processing — batch ETL, streaming, SQL, and machine learning — over S3-stored da…
A vector database that stores data in the Lance columnar format directly on object storage. Designed for serverless vector search …
An open-source vector database with hybrid search combining BM25 keyword matching and vector similarity in a single query, plus mu…
A Rust-based vector search engine with native payload filtering and a custom HNSW index implementation that applies metadata filte…
A commercial vector database launched by Actian in April 2026, multi-cloud (AWS/Azure/GCP), built on FAISS + OnDiskIVF indices wit…
A distributed vector database built for billion-scale similarity search, using a microservices architecture with SSD caching for h…
The de facto open-source PostgreSQL extension for vector similarity search. Adds a `vector` data type plus indexed nearest-neighbo…
A high-performance PostgreSQL extension for vector similarity search, positioned as a **drop-in replacement for pgvector** with or…
An open-source distributed search + analytics engine forked from Elasticsearch in 2021, now governed by the **OpenSearch Software …
An MPP analytical database with native lakehouse capabilities, able to directly query S3 data in Parquet, ORC, and Iceberg formats…
A distributed stream processing framework that processes data in real-time, with S3 as checkpoint store, state backend, and output…
An AWS S3 storage class delivering single-digit millisecond latency for frequently accessed data, using Directory Buckets in a sin…
An AWS-managed feature providing native Apache Iceberg tables as a built-in S3 capability with automated Binpack / Sort / Auto com…
Native vector storage and similarity search built into S3, operating under a dedicated `s3vectors` AWS service namespace with its …
An AWS feature that automatically generates queryable metadata tables (in Apache Iceberg format) over S3 objects, enabling SQL-bas…
AWS's serverless compute service — pay-per-invocation function execution with managed runtime, no server provisioning. **Now mount…
A POSIX file-system interface over general-purpose S3 buckets, launched April 7, 2026. Any bucket can be mounted as an NFS v4.1 or…
An open-source distributed storage system with an S3-compatible API, architecturally optimized for billions of small and large fil…
An S3-compatible object storage service from Cloudflare with zero egress fees, integrated with the Cloudflare global edge network.
A low-cost S3-compatible cloud storage service with free egress to CDN partners through the Bandwidth Alliance, designed for cost-…
An S3-compatible cloud storage service with a fixed pricing model — no egress fees, no API request fees, approximately $5–7/TB/mon…
Wasabi Technologies' AI-augmented object storage tier — facial recognition, speech-to-text, OCR, and logo detection run inline as …
A budget-tier S3-compatible cloud object storage from IDrive (the established backup vendor), priced at **~$5/TB/month** with **ze…
A Sweden-headquartered S3-compatible object storage provider, **launched May 2026**, priced at **€5/TB/month** with zero egress fe…
**OVHcloud's** S3-compatible object storage service from France's largest cloud provider. Three storage classes — **Standard (~$5/…
Alibaba Cloud's S3-compatible Object Storage Service — the dominant object store across mainland China. Standard bucket/key data m…
Tencent Cloud's Cloud Object Storage — S3-compatible, the storage backbone for Tencent's gaming, video, fintech, and Hunyuan AI tr…
Huawei Cloud's Object Storage Service — S3-compatible, tightly co-engineered with Huawei's domestic AI accelerator (Ascend 910B/91…
A disaggregated all-flash data platform providing unified access via S3, NFS, and SMB protocols, optimized for AI and deep learnin…
An enterprise-grade software-defined object storage platform from Dell with S3-compatible API, designed for on-premise and hybrid …
A software-defined S3-compatible object storage system with policy-driven information lifecycle management (ILM), designed for ent…
An all-flash unified file and object storage platform from Pure Storage with S3-compatible API, designed for AI, analytics, and mo…
Enterprise-grade software-defined object storage from Hitachi, S3-compatible, with native Iceberg-aware S3 Tables functionality an…
Hewlett Packard Enterprise's enterprise scale-out object storage platform, S3-compatible, with native data-intelligence services b…
A lightweight, self-hosted, geo-distributed S3-compatible object storage system designed for small distributed clusters, edge depl…
An open-source distributed data caching and orchestration layer between S3-compatible object storage and compute (Spark, Trino, Py…
**Fire-Flyer File System** — DeepSeek's high-performance distributed file system purpose-built for AI training and inference, **op…
A unified data access layer providing a single API for accessing 40+ storage backends including S3, GCS, Azure Blob, HDFS, and loc…
A Git-like version control system for data lakes on S3, providing branching, committing, merging, and rollback for datasets stored…
A Kubernetes storage orchestrator that deploys and manages Ceph clusters on Kubernetes, providing K8s-native S3-compatible object …
A high-performance FUSE-based filesystem that provides POSIX-compatible access to S3-compatible object storage, optimized for AI/M…
A POSIX-compliant distributed filesystem that uses S3-compatible object storage as its data backend and a separate metadata engine…
An open-source REST catalog for Apache Iceberg with centralized RBAC, originally developed by Snowflake and donated to Apache.
A unified metadata lake — "catalog of catalogs" — that federates Iceberg, Hive, Kafka, and file-based data sources into a single g…
An open-source, multi-format data catalog by Databricks (Linux Foundation), supporting Iceberg, Delta Lake, Hudi, and unstructured…
A zero-copy metadata translator (Apache incubating, formerly OneTable) that converts between Iceberg, Delta Lake, and Hudi metadat…
A Delta Lake feature that automatically generates Iceberg and Hudi metadata for Delta tables, enabling cross-format reads without …
An Apache top-level streaming lakehouse table format built on LSM-tree architecture, designed for high-frequency real-time writes …
Apache Flink connectors for reading database change logs (MySQL binlog, PostgreSQL WAL) and streaming them directly into lakehouse…
A managed real-time data integration platform with exactly-once connectors for streaming data from databases and SaaS APIs into S3…
A Python-native stream processing framework built on a Rust-based Timely Dataflow engine, designed for real-time data transformati…
A platform for programmatically authoring, scheduling, and monitoring workflows as directed acyclic graphs (DAGs) written in Pytho…
A high-performance, S3-compatible object storage server written in Swift on SwiftNIO, distributed under Apache 2.0. Uses ARC (Auto…
A high-performance, Rust-based, S3-compatible object storage server positioned as a truly open-source alternative to MinIO.
The reference implementation for OpenLineage — an open-source metadata and lineage service with a web UI for visualizing data flow…
A framework for fine-grained security and centralized auditing across the Hadoop and lakehouse ecosystem, providing column-level a…
An S3 feature that reduces KMS API calls by up to 99% by caching encryption key material at the bucket level rather than making in…
A stateless, S3-native data streaming platform with Kafka protocol compatibility. No local disks, no brokers to manage — all data …
A real-time analytical database with native lakehouse capabilities, querying Iceberg, Hudi, and Paimon tables on S3 directly. Late…
An enterprise storage platform with S3-compatible object storage, delivering hardware-defined performance guarantees at petabyte s…
A purpose-built, hardware-defined storage appliance providing S3-compatible object storage on Ceph with auditable supply-chain man…
AWS's fully managed metadata catalog service that stores table definitions, partition information, and schema metadata for data st…
The original metadata catalog service from the Apache Hive project that stores table schemas, partition mappings, and storage loca…
A lakehouse query engine that provides SQL analytics directly on S3-stored data with integrated Iceberg table management, data ref…
A unified data + AI platform built on Apache Spark and Delta Lake, with a managed lakehouse covering data engineering, SQL analyti…
AWS's serverless, pay-per-query SQL engine that runs queries directly against data stored in S3 without requiring infrastructure p…
An open-source distributed platform for change data capture (CDC) that streams row-level changes from databases (PostgreSQL, MySQL…
An extensible query execution framework written in Rust, built on Apache Arrow, that provides a SQL query planner and execution en…
A high-performance DataFrame library written in Rust with Python and Node.js bindings, designed for fast columnar analytics with l…
An Apache Kafka feature (KIP-405) that offloads older log segments from broker-local disks to S3-compatible object storage, extend…
A Kafka-compatible streaming platform written in C++ that provides a single binary deployment with built-in Tiered Storage to S3, …
An open-source transactional catalog for data lakes that provides Git-like branching, tagging, and commit semantics for Iceberg ta…
An open-source data integration platform that provides pre-built connectors for extracting data from hundreds of sources (APIs, da…
Apache Spark's stream processing API that enables continuous, micro-batch, or near-real-time ingestion of data streams into S3-bac…
A C++ vectorized execution engine developed by Meta that provides a unified, high-performance data processing backend usable by mu…
A Python library for declarative data loading (data load tool) that simplifies building data pipelines to extract from APIs and lo…
An open-source metadata platform providing a centralized catalog for data discovery, quality, lineage, and governance across S3-ba…
An open-source metadata platform originally developed at LinkedIn that provides data discovery, lineage tracking, governance, and …
An open-source metadata management and governance framework originally built for the Hadoop ecosystem, providing classification, l…
A command-line program that synchronizes files and directories to and from cloud storage, supporting **70+ backends** through a si…
A multimodal vector store (MVS) testing and benchmarking platform that evaluates S3-compatible providers for AI/ML workloads — fee…
An S3-compatible, globally distributed object storage platform engineered to optimize small-object workloads through metadata inli…
NVIDIA's client/server library stack released November 2025 that moves S3-compatible object data directly from storage-node memory…
A WireGuard-based secure mesh-networking platform. In April 2026, Tailscale added an S3-compatible export for log and telemetry da…
An open-source universal memory layer for AI agents, distributed under Apache 2.0. Provides persistent semantic memory backed by S…
An open-source AI memory platform (Apache 2.0) built around the **Graphiti** temporal-knowledge-graph engine. Zep stores semantic …
The open-source temporal knowledge-graph engine that powers Zep. Real-time knowledge-graph construction for AI agents — stores ent…
A high-performance distributed **KV-cache offloading** layer for LLM inference, written to maximize prefix-reuse across vLLM and o…
An open-source LLM serving engine optimized for structured generation and prefix sharing. Distributed under Apache 2.0. The **Radi…
The open-source LLM serving platform for **Kimi**, Moonshot AI's leading LLM product. Repository: [github.com/kvcache-ai/Mooncake]…
A cognitive-memory system for AI agents, distributed as a single ~22MB Rust binary that doubles as an **MCP server** for Claude, C…
An open-source agent-runtime framework built on top of LangChain that models agentic workflows as **state machines** — supervisor/…
An open-source **model gateway** that abstracts the complexity of calling hundreds of different LLM endpoints behind a unified, Op…
An open-source **AI gateway** (MIT-licensed) sitting between the agent runtime and foundation models. Provides observability (per-…
**Traefik Labs**'s commercial AI gateway, layered on the Traefik reverse proxy heritage. In December 2025, Traefik joined the **HP…
NVIDIA's fourth-generation **Data Processing Unit (DPU)**, announced in 2026 as the substrate for a new class of **AI-native stora…
A new storage tier — also referred to as **Context Memory eXtension (CMX)** — sitting between traditional NVMe SSDs and cold S3 bu…
NVIDIA's library coordinating the highly orchestrated data movement between storage tiers, GPUs, and inference engines. NIXL provi…
A commercial **memory orchestration** platform for AI workloads, providing software-defined coordination of CXL-attached memory po…
NVIDIA's CUDA library extending **GPUDirect Storage (GDS)** semantics to S3-compatible object storage. Where the original GDS targ…