Trade-offs vs Redis:
- Not for multi-region distributed systems
- Best for single server or small clusters
Happy to answer any questions about the architecture!
We are using a service that abstracts redis from us and requires to be treated like a critical dependency, think RDS, Aurora, Postgres, if they are down the whole site is down. Every job push is a call to this service. Upgrading the service = downtime.
For us this is resulted in a big weak point on our architecture because when the service reboots both job pushing and job pulling stops, with the pushing being on the API side bringing the API down. With containers we could have multiple of them running at the same time, but the shared reading/writing of the abstract Redis locks itself.
We are considering BullMQ, because the architecture is sane:
* job push: API writes to Redis
* job pull: Worker reads from Redis then writes the completion.
How do you see this issue for Bunqueue? What happens when it goes down for 5 minutes, can the jobs be enqueued? Can you run multiple instances of it, failover?
Our throughput (jobs/sec) is small we do have 100k+ scheduled jobs anywhere from minutes to months from now.
Great question - let me be transparent about bunqueue's architecture.
Current state: bunqueue is a single-server architecture with SQLite persistence. If the server goes down for 5 minutes:
- Clients cannot push/pull jobs during that window
- The client SDK has automatic reconnection with exponential backoff + jitter
- All data is safe on disk (SQLite WAL mode) - nothing is lost
- On restart, active jobs are detected as stalled and re-queued automatically
- Delayed/scheduled jobs resume from their run_at timestamps
For your use case (100k+ scheduled jobs, low throughput): This is actually well-optimized. We use MinHeap + SQLite indexes for O(k) refresh where k = jobs becoming ready, not O(n) scan of all jobs.
What bunqueue does NOT have today:
- No clustering or multi-instance with shared state
- No automatic failover
- No replication
What it does have:
- S3 automated backups (compressed, checksummed) for disaster recovery
- "durable: true" option for zero data loss on critical jobs
- Zero external dependencies (no Redis/Postgres to manage)
Roadmap: HA is definitely something we're working toward. The vision includes:
- Native HA with leader election and replication
- Managed cloud offering with HA, automatic failover, and geographic distribution
For now, if you need true HA today, BullMQ + Redis Sentinel/Cluster is the safer choice. bunqueue shines when you want simplicity, high performance (~100k jobs/sec), and can tolerate brief downtime windows with automatic recovery.
Happy to discuss further if you have specific questions about the architecture.
Hi HN! I built bunqueue because I got tired of spinning up Redis just for background jobs.
The idea: for single-server deployments, SQLite can handle 100k+ ops/sec with WAL mode, so why add infrastructure?
Features: priorities, delays, retries, cron jobs, DLQ, job dependencies, BullMQ-compatible API.We are using a service that abstracts redis from us and requires to be treated like a critical dependency, think RDS, Aurora, Postgres, if they are down the whole site is down. Every job push is a call to this service. Upgrading the service = downtime.
For us this is resulted in a big weak point on our architecture because when the service reboots both job pushing and job pulling stops, with the pushing being on the API side bringing the API down. With containers we could have multiple of them running at the same time, but the shared reading/writing of the abstract Redis locks itself.
We are considering BullMQ, because the architecture is sane: * job push: API writes to Redis * job pull: Worker reads from Redis then writes the completion.
How do you see this issue for Bunqueue? What happens when it goes down for 5 minutes, can the jobs be enqueued? Can you run multiple instances of it, failover?
Our throughput (jobs/sec) is small we do have 100k+ scheduled jobs anywhere from minutes to months from now.
Great question - let me be transparent about bunqueue's architecture.
Current state: bunqueue is a single-server architecture with SQLite persistence. If the server goes down for 5 minutes:
- Clients cannot push/pull jobs during that window - The client SDK has automatic reconnection with exponential backoff + jitter - All data is safe on disk (SQLite WAL mode) - nothing is lost - On restart, active jobs are detected as stalled and re-queued automatically - Delayed/scheduled jobs resume from their run_at timestamps
For your use case (100k+ scheduled jobs, low throughput): This is actually well-optimized. We use MinHeap + SQLite indexes for O(k) refresh where k = jobs becoming ready, not O(n) scan of all jobs.
What bunqueue does NOT have today:
- No clustering or multi-instance with shared state - No automatic failover - No replication
What it does have:
- S3 automated backups (compressed, checksummed) for disaster recovery - "durable: true" option for zero data loss on critical jobs - Zero external dependencies (no Redis/Postgres to manage)
Roadmap: HA is definitely something we're working toward. The vision includes:
- Native HA with leader election and replication - Managed cloud offering with HA, automatic failover, and geographic distribution
For now, if you need true HA today, BullMQ + Redis Sentinel/Cluster is the safer choice. bunqueue shines when you want simplicity, high performance (~100k jobs/sec), and can tolerate brief downtime windows with automatic recovery.
Happy to discuss further if you have specific questions about the architecture.