Great link. I've always been drawn to sqlite3 just from a simplicity and operational point of view. And with tools like "make it easy to replcate" Litestream and "make it easy to use" sqlite-utils, it just becomes easier.
And one of the first patterns I wanted to use was this. Just a read-only event log that's replicated, that is very easy to understand and operate. Kafka is a beast to manage and run. We picked it at my last company -- and it was a mistake, when a simple DB would have sufficed.
I love the idea of SQLite, but I actually really dislike using it.
I think part of my issue is that a lot of uses of it end up having a big global lock on the database file (see: older versions of Emby/Jellyfin) so you can't use it with multiple threads or processes, but I also haven't really ever find a case to use it over other options. I've never really felt the need to do anything like a JOIN or a UNION when doing local configurations, and for anything more complicated than a local configuration, I likely have access to Postgres or something. I mean, the executable for Postgres is only ten megs or twenty on Linux, so it's not even that much bigger than SQLite for modern computers.
Peter (the author) is a really, really cool guy. We recorded a 3hr 30m podcast[0] with him a month ago. For anyone interested in the Kafka space, performance optimization in Rust and the general "why yet another Kafka", I'd shamelessly recommend the video:
I love sqlite backed system, one less component to worry about. But when using Tansu with sqlite storage, what are my options for horizonal scaling and keeping Tansu HA?
Also, are there any benchmark on how Tansu with S3 storage would perform in comparison to Kafka or something like WarpStream?
You could use the proxy to spread topics over a number of brokers. The broker and proxy share a number of services and layers, that could be used to route:
My itch for SQLite was smaller scale (and reproducible) environments, e.g., development, test/integration (with a single file to reset the environment). PostgreSQL was intended for "larger scale", with (database level) partitioning of Kafka records on each topic/partition, and replication for leader/follower setups, which might work better for HA. S3 for environments where latency is less of any issue (though with the SlateDB/S3 engine that might change).
S3: Not yet. I've been working through tuning each engine, S3 is next on the list.
Any good and honest tansu experience reports out there? Would be nice to understand how “bleeding edge” this actually is, in practice. The idea of a kafka compatible, but trivial to run, system like this is very intriguing!
Kafka is not a straightforward protocol and has a few odd niches. Not to mention that message formats have changed over the years. Even the base product has recently dropped support for some of the oldest API versions. And there are still plenty of clients out there using old versions of librdkafka (he says from experience).
So I'd be interested how (backward-)compatible they are.
I agree that it isn't straight forward! Tansu uses the JSON protocol descriptors from Apache Kafka, generating ~60k LoC of Rust to represent the structures. It then uses a custom Serde encoder/decoder to implement the protocol: original, flexible and tag buffers formats for every API version (e.g., the 18 just in FETCH). It is based off spending the past ~10 years using Kafka, and writing/maintaining an Erlang client (there are no "good" Kafka clients for Erlang!). It also uses a bunch of collected protocol examples, to encode/decode during the tests. Tansu is also a Kafka proxy, which is also used to feed some of those tests.
However, there are definitely cases I am sure where Tansu isn't compatible. For example, Kafka UI (kafbat) reports a strange error when doing a fetch (despite actually showing the fetched data), which I've yet to get to the bottom of.
If you find any compatibility issues, then please raise an issue, and I can take a look.
I've used Redpanda for local development and testing stands. It is super easy to setup in docker, starts really fast and consumes less resources than Java version. Haven't really compared it to anything, but I remember using Java version of Kafka before and it was a resource hog. It is important when you develop on laptop with constrained resources.
to be fair, Kafka now has a GraalVM docker image[0][1] which was made for local dev/testing, and it has caught up fairly well to these alternatives re: memory and startup time
Quite cool. 7000 records per second is usable for a lot of projects.
One note on the backup/migrate, I think you need a shared lock on the database before you copy the database. If you dont, the database can corrupt. SQLite docs have other recommendations too:
I didn't know about Tansu and probably would not use it for anything too serious (yet!). Bus as a firm believer of event sourcing and change of paradigm that Kafka brings this is certainly interesting for small projects.
This SQLite obsession is getting quite ridiculous. Now they put it in "the Cloud." What a shitshow. I wonder whether they know what SQLite is for... when Cloudflare did it, well, it made sense at least. This new generation of SQLite caro-culting is beyond anything I've ever seen.
Yes: with a schema backed topic (AVRO, JSON or Protocol buffer) Tansu can write to Apache Iceberg, Delta or Parquet. You can use a Sink topic to write directly to an open table format (including Parquet) skipping (most of) the Kafka metadata.
Great link. I've always been drawn to sqlite3 just from a simplicity and operational point of view. And with tools like "make it easy to replcate" Litestream and "make it easy to use" sqlite-utils, it just becomes easier.
And one of the first patterns I wanted to use was this. Just a read-only event log that's replicated, that is very easy to understand and operate. Kafka is a beast to manage and run. We picked it at my last company -- and it was a mistake, when a simple DB would have sufficed.
https://github.com/simonw/sqlite-utils https://litestream.io/
I love the idea of SQLite, but I actually really dislike using it.
I think part of my issue is that a lot of uses of it end up having a big global lock on the database file (see: older versions of Emby/Jellyfin) so you can't use it with multiple threads or processes, but I also haven't really ever find a case to use it over other options. I've never really felt the need to do anything like a JOIN or a UNION when doing local configurations, and for anything more complicated than a local configuration, I likely have access to Postgres or something. I mean, the executable for Postgres is only ten megs or twenty on Linux, so it's not even that much bigger than SQLite for modern computers.
https://www.sqlite.org/c3ref/busy_timeout.html
Curious, what do you think about
> PRAGMA synchronous = NORMAL;
I am just not experienced enough to form an opinion.
Peter (the author) is a really, really cool guy. We recorded a 3hr 30m podcast[0] with him a month ago. For anyone interested in the Kafka space, performance optimization in Rust and the general "why yet another Kafka", I'd shamelessly recommend the video:
[0] - https://www.youtube.com/watch?v=pJQ7hcsI1Dw
I love sqlite backed system, one less component to worry about. But when using Tansu with sqlite storage, what are my options for horizonal scaling and keeping Tansu HA?
Also, are there any benchmark on how Tansu with S3 storage would perform in comparison to Kafka or something like WarpStream?
You could use the proxy to spread topics over a number of brokers. The broker and proxy share a number of services and layers, that could be used to route:
https://blog.tansu.io/articles/route-layer-service
My itch for SQLite was smaller scale (and reproducible) environments, e.g., development, test/integration (with a single file to reset the environment). PostgreSQL was intended for "larger scale", with (database level) partitioning of Kafka records on each topic/partition, and replication for leader/follower setups, which might work better for HA. S3 for environments where latency is less of any issue (though with the SlateDB/S3 engine that might change).
S3: Not yet. I've been working through tuning each engine, S3 is next on the list.
Any good and honest tansu experience reports out there? Would be nice to understand how “bleeding edge” this actually is, in practice. The idea of a kafka compatible, but trivial to run, system like this is very intriguing!
My thoughts too.
> kafka compatible
Kafka is not a straightforward protocol and has a few odd niches. Not to mention that message formats have changed over the years. Even the base product has recently dropped support for some of the oldest API versions. And there are still plenty of clients out there using old versions of librdkafka (he says from experience).
So I'd be interested how (backward-)compatible they are.
I agree that it isn't straight forward! Tansu uses the JSON protocol descriptors from Apache Kafka, generating ~60k LoC of Rust to represent the structures. It then uses a custom Serde encoder/decoder to implement the protocol: original, flexible and tag buffers formats for every API version (e.g., the 18 just in FETCH). It is based off spending the past ~10 years using Kafka, and writing/maintaining an Erlang client (there are no "good" Kafka clients for Erlang!). It also uses a bunch of collected protocol examples, to encode/decode during the tests. Tansu is also a Kafka proxy, which is also used to feed some of those tests.
Some of the detail: https://blog.tansu.io/articles/serde-kafka-protocol
However, there are definitely cases I am sure where Tansu isn't compatible. For example, Kafka UI (kafbat) reports a strange error when doing a fetch (despite actually showing the fetched data), which I've yet to get to the bottom of.
If you find any compatibility issues, then please raise an issue, and I can take a look.
I wonder how it compares to Redpanda
I've used Redpanda for local development and testing stands. It is super easy to setup in docker, starts really fast and consumes less resources than Java version. Haven't really compared it to anything, but I remember using Java version of Kafka before and it was a resource hog. It is important when you develop on laptop with constrained resources.
to be fair, Kafka now has a GraalVM docker image[0][1] which was made for local dev/testing, and it has caught up fairly well to these alternatives re: memory and startup time
[0] - https://cwiki.apache.org/confluence/display/KAFKA/KIP-974%3A... [1] - https://hub.docker.com/r/apache/kafka-native
What I meant was how Tensu compares to Redpanda
Quite cool. 7000 records per second is usable for a lot of projects.
One note on the backup/migrate, I think you need a shared lock on the database before you copy the database. If you dont, the database can corrupt. SQLite docs have other recommendations too:
https://sqlite.org/backup.html
I didn't know about Tansu and probably would not use it for anything too serious (yet!). Bus as a firm believer of event sourcing and change of paradigm that Kafka brings this is certainly interesting for small projects.
To me it sounds like NATS Jetstream but with Rust. I wonder what the reliability looks like when it is prod ready
Jetstream isn't kafka-compatible, nor does it have pluggable storage of s3, sqlite, Postgres etc...
everything is dead. what lives on is their protocol.
same for redis, kafka, ...
How does it compare to Redis streams with persistent storage?
This SQLite obsession is getting quite ridiculous. Now they put it in "the Cloud." What a shitshow. I wonder whether they know what SQLite is for... when Cloudflare did it, well, it made sense at least. This new generation of SQLite caro-culting is beyond anything I've ever seen.
Tansu author here. Storage is a pluggable choice of: PostgreSQL, memory, SQLite or S3. There are others in the pipeline (SlateDB, ...).
Any chance of a Parquet compatible storage choice?
Yes: with a schema backed topic (AVRO, JSON or Protocol buffer) Tansu can write to Apache Iceberg, Delta or Parquet. You can use a Sink topic to write directly to an open table format (including Parquet) skipping (most of) the Kafka metadata.
https://blog.tansu.io/articles/parquet