While modern lakehouse platforms now natively support tables, geospatial data, vectors, and more, property graphs remain a missing piece. With the rise of AI and growing interest in Graph RAG, graphs are becoming increasingly relevant—there’s a clear need to deliver Knowledge Graphs into RAG systems with proper standards, ETL, and frameworks for different use cases.
A young project, Apache GraphAr (incubating), is aiming to define a storage standard. On the processing side, the ecosystem already has strong tooling: GraphFrames (akin to Spark for Iceberg—batch and distributed), Kuzu (akin to DuckDB for Iceberg—fast, in-memory, in-process), and Apache HugeGraph (akin to ClickHouse/Doris for graphs—a standalone server for queries).
There’s also work underway on graphframes-rs, which brings Apache DataFusion and its ecosystem into this landscape. With all these components available, the challenge now is to put the pieces together.
While modern lakehouse platforms now natively support tables, geospatial data, vectors, and more, property graphs remain a missing piece. With the rise of AI and growing interest in Graph RAG, graphs are becoming increasingly relevant—there’s a clear need to deliver Knowledge Graphs into RAG systems with proper standards, ETL, and frameworks for different use cases.
A young project, Apache GraphAr (incubating), is aiming to define a storage standard. On the processing side, the ecosystem already has strong tooling: GraphFrames (akin to Spark for Iceberg—batch and distributed), Kuzu (akin to DuckDB for Iceberg—fast, in-memory, in-process), and Apache HugeGraph (akin to ClickHouse/Doris for graphs—a standalone server for queries).
There’s also work underway on graphframes-rs, which brings Apache DataFusion and its ecosystem into this landscape. With all these components available, the challenge now is to put the pieces together.