You could cut your MongoDB costs by 100% by not using it ;)
> without sacrificing performance or reliability.
You're using a single server in a single datacenter. MongoDB Atlas is deployed to VMs on 2-3 AZs. You don't have close to the same reliability. (I'm also curious why their M40 instance costs $1000, when the Pricing Calculator (https://www.mongodb.com/pricing) says M40 is $760/month? Was it the extra storage?)
> We're building Prosopo to be resilient to outages, such as the recent massive AWS outage, so we use many different cloud providers
This means you're going to have multiple outages, AND incur more cross-internet costs. How does going to Hetzner make you more resilient to outages? You have one server in one datacenter. Intelligent, robust design at one provider (like AWS) is way more resilient, and intra-zone transfer is cheaper than going out to the cloud ($0.02/GB vs $0.08/GB). You do not have a centralized or single point of failure design with AWS. They're not dummies; plenty of their services are operated independently per region. But they do expect you to use their infrastructure intelligently to avoid creating a single point of failure. (For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.)
I get it; these "we cut bare costs by moving away from the cloud" posts are catnip for HN. But they usually don't make sense. There's only a few circumstances where you really have to transfer out a lot of traffic, or need very large storage, where cloud pricing is just too much of a premium. The whole point of using the cloud is to use it as a competitive advantage. Giving yourself an extra role (sysadmin) in addition to your day job (developer, data scientist, etc) and more maintenance tasks (installing, upgrading, patching, troubleshooting, getting on-call, etc) with lower reliability and fewer services, isn't an advantage.
> Intelligent, robust design at one provider (like AWS) is way more resilient, and intra-zone transfer is cheaper than going out to the cloud ($0.02/GB vs $0.08/GB).
If traffic cost is relevant (which it is for a lot of use cases), Hetzner's price of $1.20/TB ($0.0012 / GB) for internet traffic [1] is an order of magnitude less than what AWS charges between AWS locations in the same metro. If you host only at providers with reasonable bandwidth charges, most likely all of your bandwidth will be billed at less than what AWS charges for inter-zone traffic. That's obscene. As far as I can tell, clouds are balancing their budgets on the back of traffic charges, but nothing else feels under cost either.
> For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.
This doesn't always work out. During the GCP outage, my service was running fine, but other similar services were having trouble, so we attracted more usage, which we would have scaled up for, except that the GCP outage prevented that. Cloud makes it very expensive to run scaled beyond current needs and promises that scale out will be available to do just in time...
At some point our cross-AZ traffic for Elasticsearch replication at AWS was more expensive than what we'd pay to host the whole cluster replicated across multiple baremetal Hetzner servers.
Could we have done better with more sensible configs? Was it silly to cluster ES cross-AZ? Maybe. Point is that if you don't police every single detail of your platform at AWS/GCP and the like, their made-up charges will bleed your startup and grease their stock price.
turns out cross AZ is recommended for ES. perhaps our data team was rewritting the indices too often. but it was an internal requirement. so I think the data schema could have been more efficient to append deltas instead of reindexing all. but none of that will inflate your bill significantly at Hetzner. of course it will at AWS as that's how they incentivise clients to optimize and reduce their impact. and that's how you cut your runway by 3-6 months in compute heavy startups
I think you underestimate how reduction in complexity can increase reliability. becoming a sysadmin for a single inexpensive server instance carries almost the same operational burden as operating an unavoidably very complicated cluster using a cloud provider.
not if you are using Atlas. Its as simple as it can be with way more functionality you can ever admin in yourself.
As others have said unless the scale of the data is the issue, if your switching because of cost, perhaps you should be going back to your business model instead.
> we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.
I think it was just luck of the draw that the failure happened in this way and not some other way. Even if APIs falling over but EC2 instances remaining up is a slightly more likely failure mode, it means you can't run autoscaling, can't depend on spot instances which in an outage you can lose and can't replace.
> you're going to have multiple outages
us: 0, aws: 1. Looking good so far ;)
> AND incur more cross-internet costs
hetzner have no bandwidth traffic limit (only speed) on the machine, we can go nuts.
I understand you point wrt the cloud, but I spend as much time debugging/building a cloud deployment (atlas :eyes: ) as I do a self-hosted solution. Aws gives you all the tools to build a super reliable data store, but many people just chuck something on us-east-1 and go. There's you single point of failure.
Given we're constructing a many-node decentralised system, self-hosted actually makes more sense for us because we've already had to become familiar enough to create a many-node system for our primary product.
When/if we have a situation where we need high data availability I would strongly consider the cloud, but in the situations where you can deal with a bit of downtime you're massively saving over cloud offerings.
We'll post a 6-month and 1-year follow-up to update the scoreboard above
> many people just chuck something on us-east-1 and go
Even dropping something on a single EC2 node in us-east-1 (or at Google Cloud) is going to be more reliable over time than a single dedicated machine elsewhere.
This is because they run with a layer that will e.g. live migrate your running apps in case of hardware failures.
The failure modes of dedicated are quite different than those of the modern hyperscaler clouds.
It's not an apples-to-apples comparison, because EC2 and Google Cloud have ephemeral disk - persistent disk is an add-on, which is implemented with a complex and frequently changing distributed storage system
On the other hand, a Hetzner machine I just rented came with Linux software RAID enabled (md devices in the kernel)
---
I'm not aware of any comparisons, but I'd like to see see some
It's not straightforward, and it's not obvious the cloud is more reliable
The cloud introduces many other single points of failure, by virtue of being more complex
e.g. human administration failure, with the Unisuper incident
Of course, dedicated hardware could have a similar type of failure, but I think the simplicity means there is less variety in the errors.
e.g. A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable - Leslie Lamport
I just wish there was a way to underscore this more and more. Complex systems fail in complex ways. Sadly, for many programmers, the thrill or ego boost that comes with solving/managing complex problems lets us believe complex is better than simple.
Thanks for sharing the story and committing to a 6-month and 1 year follow up. We will definitely be interested to hear further how it went over time.
In the mean time, I am curious where the time was spent debugging and building Atlas deployments? It certainly isn't the cheapest option, but it has been quite a '1 click' solution for us.
I’m curious about the resilience bit. Are you planning on some sort of active-active setup with mongo? I found it difficult on AWS to even do active-passive (i guess that was docdb), since programatically changing the primary write node instance was kind of a pain when failing over to a new region.
Going into any depth with mongo mostly taught me to just stick with postgres.
> You're using a single server in a single datacenter.
This is a common problem with “bare metal saved us $000/mo” articles. Bare metal is cheaper than cloud by any measure, but the comparisons given tend to be misleadingly exaggerated as they don't compare like-for-like in terms of redundancy and support, and after considering those factors it can be a much closer result (sometimes down as far as familiarity and personal preference being more significant).
Of course unless you are paying extra for multi-region redundancy things like the recent us-east-1 outage will kill you, and that single point of failure might not really matter if there are several others throughout your systems anyway, as is sometimes the case.
It doesn't have to be one server in a single datacenter, though. It adds some complexity, but you could have a backup server ready to go at a different cheap provider (Hetzner and OVH, for example) and still save a lot.
It doesn't have to be only one server in one datacenter though.
It's more work, but you can have replicas ready to go at other Hetzner DCs (they offer bare metal at 3 locations in 2 different countries) or at other cheaper providers like OVH. Two or three $160 servers is still cheaper than what they're paying right now.
These types of posts make for excellent karma farming, but this one does present all the issues you've mentioned. Heck, Scaleway has managed Mongo for a bit more money and with redundancy and multi-AZ to boot. Were they trying to go as cheap as possible?
Usually AWS is pretty good at hiding all the reliability and robustness that goes onto into making a customer's managed service. Customers are not made aware what it takes.
I love MongoDB's query language (JS/Node.js developer so the syntax fits my mental model well), but running a production replica set without spending tons of cash is a nightmare. Doubly so if you have any unoptimized queries (it's easy to trick yourself into thinking throwing more hardware at the problem will help). Lord help you if you use a hosted/managed service.
Just fixed a bug on my MongoDB instance last night that, due to a config error w/ self-signed certs (the hostname in the replica set config has to match the CN on the cert), that caused MongoDB to rocket to 400% CPU utilization (3x, 8GB, 4VCPU dedicated boxes on DO) due to a weird election loop in the replica set process. Fixing that and adding a few missing indexes brought it down to ~12% on average. Simple mistakes, sure, but the real-world cost of those mistakes is brutal.
The dump, restore and custom scripts to synchronize the new instance sound a bit odd. You could just add the instance as a secondary to your cluster and mongo itself handles synchronization. Then removing the old instances automatically promotes the new to primary.
OK guys, running on a single instance is REALLY a BAD IDEA for non-pet-projects. Really bad! Change it as fast as you can.
I love Hetzner for what they offer but you will run into huge outages pretty soon. At least you need two different network zones on Hetzner and three servers.
I think you're being overly dramatic. In practice I've seen complexity (which HA setups often introduce) causing downtimes far more often than a service being hosted only on a single instance.
Yes, any time someone says "I'm going to make a thing more reliable by adding more things to it" I either want to buy them a copy of Normal Accidents or hit them over the head with mine.
You'll have planned downtime just for upgrading MongoDB version or rebooting the instance. I don't think that this is sth you'd want to have. Running MongoDB in a replica set is really easy and much easier than running postgres or MySQL in an HA setup.
No need for SREs. Just add 2 more Hetzner servers.
We have used Hetzner for 15+ years. There were some outages with the nastiest being the network ones. But they're usually not "dramatically bad" if you build with at least basic failover. With this we had seen less than 1 serious per 3 years. Most of the downtime is because of our own stupidity.
If you know what you're doing Hetzner is godsend, they give you hardware and several DCs and it's up to you what you can do. The money difference is massive.
There are so many applications the world is running on that only have one instance that is maybe backupped.
Not everything has to be solved by 3 reliability engineers.
agree on single instance, but for hetzner, I run 100+ large bare metal servers in hetzner, have for at least 5 years and there’s only been one significant outage they had, we do spread across all their datacenter zones and replicate, so it’s all been manageable. It’s worth it for us, very worth it.
I experienced some cutthroat commercial behavior from MongoDB. It scared us enough to avoid Atlas, and ultimately move to Cosmos on Azure. Massive savings.
I moved to another employer that was using Atlas, and the bill rivaled AWS. Unfortunately it was too complex to untangle.
As much as I like MongoDB as a developer, the last thing I ever want to do is manage a deployment again.
I feel like some of these articles miss a few points, even in this one. The monthly cost of the MongoDB hosting was around $2k... that's less than a FT employee salary, and if it can spare you the cost of an employee, that's not a bad thing.
On the flip side, if you have employee talent that is already orchestrating Kubernetes across multiple clouds, then sure it makes sense to internalize services that would otherwise be external if it doesn't add too much work/overhead to your team(s).
In either case, I don't think the primary driver in this is cost at all. Because that 90% quoted reduction in hosting costs is balanced by the ongoing salary of the person or people who maintain those systems.
Atlas is plain robbery. I see companies paying 600K USD/month on a few clusters, mostly used for testing. The problem is they got locked into this, by doing a huge migration of their apps and switching to a different tech would easily take 2 to 5 years.
I‘m a big fan of owning the stack but why not spend the money on redundancy? At least a couple of machines in a different data center at Hetzner or another provider (OVH, Scaleway, Vultr, …) can easily fit your budget.
Business people are weird about numbers. You should have claimed 70% even if the replicas do nothing and made them work later on. This is highly likely to bite you on the ass.
+1 this is so true. You've lost, you've already publicly praised yourself that you saved 90%. They won't like the idea of tripling the costs, even if it is still below the previous costs.
Always consider if 12 hours of lost revenue is worth the savings. Recently hetzner has been flakey with minimum or no response for support or even status updates that anything was wrong. My favorite was them blaming an issue on my side just to have a maintenance status update the day after about congestion.
Atlas wasn't giving us any support for $3K per month. Hetzner at least have some channel to contact them, which is an improvement. That said, if their uptime is rubbish them we'll probably migrate again. Moving back to Atlas is not an option as we were getting hammered by the data transfer costs and this was only going to increase due to our architecture. Thanks for reading!
500GB isn't a lot of data, and $3K/month seems like an extortion for that little data.
Having said that, MongoDB pricing page promises 99.995% uptime, which is outstanding, and would probably be hard to beat that doing it oneself, even after adding redundancy. But maybe you don't need that much uptime for your particular use case.
Yep, we just migrated to Atlas, and the disk size limitation of the lower instance tiers pushed us to do a round of data cleaning before the migration.
Also, we noticed that after migration, the databases that were occupying ~600GB of disk in our (very old) on premise deployment, were around 1TB big on Atlas. After talking with support for a while we found that they were using Snappy compression with a relatively low compression level and we couldn't change that by ourselves. After requesting it through support, we changed to zstd compression, rebuilt all the storage, and a day or two later our storage was under 500GB.
And backup pricing is super opaque. It doesn't show concrete pricing on the docs, just ranges. And depending on the cloud you deployed, snapshots are priced differently so you can't just multiply you storage by the number of the snapshots, and they aren't transparent about the real size of the snapshots.
If I understand correctly, the author's company provides a CAPTCHA alternative, which presumably means that if their service goes down, all of their customer's logins, forms, etc. either become inoperable or don't provide the security the company is promising by using their service.
This makes me want to use the company's service less because now I know they can't survive an outage in a consistent and resilient way.
Note, if you're looking for MongoDB Enterprise features you can find many of them with Percona Server for MongoDB, which you can use for free the same way as MongoDB Community
MongoDB Atlas was around 500% more expensive than in-house every time I evaluated it (at almost every scale they offer as well).
They also leaned too heavily on sharding as a universal solution to scaling as opposed to leveraging the minimal cost of terabytes of RAM. The p99 latency increase, risk of major re-sharding downtime, increased restore times, and increased operational complexity weren't worth it for ~1 TB datasets.
I don't remember the numbers (90% is probably a bit exaggerated) but our savings of going from Atlas to MongoDB Community on EC2 several years ago were big.
In addition to direct costs, Atlas had also expensive limitations. For example we often spin up clone databases from a snapshot which have lower performance and no durability requirements, so a smaller non-replicated server suffices, but Atlas required those to be sized like the replicated high performance production cluster.
Was it? Assuming an M40 cluster consists of 3 m6g.xlarge machines, that's $0.46/hr on-demand compared to Atlas's $1.04/hr for the compute. Savings plans or reserved instances reduce that cost further.
Highly doubt that. MongoDB has 5000 well paid employees and is not a big loss making enterprise. If most of the cost was pass through to AWS, they’d not be able to do that. Their quarterly revenue is $500M+ but also spend $200M in sales and marketing and $180M in R&D. (All based on their filings)
A few years back I launched an io game and used hetzner as my backend. an hour into launch day they null routed my account because their anti-abuse system thought my sudden surge in websocket connections was an attack (unclear if they thought it was inbound or outbound doing the attacking).
I had paid for advertising on a few game curation sites plus youtubers and streamers. Lovely failure all thanks to Hetzner. Took 3 days and numerous emails with the most arrogant Germans you’ve ever met before my account was unlocked.
I switched to OVH and while they’re not without their own faults (reliability is a big one), it’s been a far better experience.
It seems like you have to go to one of the big boys like hurricane electric where you are allowed to use the bandwidth you paid for without someone sticking their fingers in it.
There are a lot of such stories if you go digging around HN and reddit threads. Haven't seen a lot of these stories in a while, so it may be happening less now.
Good shout. I think we'll also run replicas on other providers. We've got some complex geo-fencing stuff to do with regards to data hence why we're just on Hetzner right now.
How long does mongodump take on that database? My experience was that incremental filesystem/blockdevice snapshots were the only realistic way of backing up (non sharded) mongodb. In our case EBS snapshots, but I think you can achieve the same using LVM or filesystems like XFS and ZFS.
It takes ~21hrs to dump the entire db (~500gb), but I'm limited by my internet speed (100mbps, seeing 50-100mbps during dump). Interestingly, the throughput is faster than doing a db dump from atlas which used to max around 30mbps
We’re just going to end up with everyone moving from Amazon to Hetzner and the same issue will remain. High prices, lockin, etc will appear.
We need an American “get off American big tech” movement.
Differentiate people! Reading “we moved from X to Y” does not mean everyone move from X to Y, it means start considering the Y values and research other Y’s around you.
Nice, if you write an article about it, try to leave the focus off of a single hosting provider. Encouraging the differentiation is important too (next time! I’m not dogging the movement or your efforts in this article, I love to see reduced reliance of Amazon in general).
Having run a small mongo database and having it hosted in 3 different places at one point. The last point was atlas, yes it was expensive but we got replication, we could have an analytical node, we even had data residency. If I remember correctly you can have your replicas in different providers at the same time.
One of the biggest issues was cost, but we were treated like first class citizens, the support was good, we saw constant updates and features. Using atlas search was fantastic because we didn't have to replicate the data to another resource for quick searching.
Before atlas we were on Compose.io and well mongo there just withered and we were plagued by performance issues
Why in the world do people choose Mongo over Postgres? I'm legit curious. Is it inexperience? Javascript developers who don't know backend or proper data modeling (or about jsonb)? Is this type of decision coming down from non-technical management? Are VCs telling their portfolio companies what to use so they have something to burn their funding on? It's just really confounding, especially when there's even mongo-api compatible Postgres solutions now. Perhaps I'm just not webscale and too cranky.
Personally I've found it faster to build using mongo cause you don't need to worry about schemas. You get 32mb per document and you can work out your downstream processing later, e.g. cleanup and serve to postgres, file, wherever. This data is a big data dump that's feeding ML models so relational stuff is not that important.
I used to build personal projects like this, but after Postgres got JSONB support I haven't found any reason to not just start with Postgres. There's usually a couple of tables/columns you want a proper schema for, and having it all in Postgres to begin with makes it much easier to migrate the schemaless JSONB blobs later on.
It depends on your use case, and RDBMS isn't the best option for all needs. Mongo's approach is pretty useable. That said, there are alternatives, you can get very similar characteristics, though a more painful devex out of say CockroachDB with (key:string, value: JSONB) tables.
The only thing I really don't care for is managing Mongo... as a developer, using it is pretty joyous assuming you can get into the query mindset of how to use it.
Also, if you're considering Mongo, you might also want to consider looking at Cassandra/ScyllaDB or CockroachDB as alternatives that might be a better fit for your needs that are IMO, easier to administer.
I'll repeat it again: you don't always want a relational database. Sometimes you need a document-oriented one. It matches quite a lot of use cases, e.g. when there aren't really interesting relations, or when the structures are very deep. That can be really annoying in SQL.
> when there's even mongo-api compatible Postgres solutions
I'd probably use a jsonfield in postgres for data that i knew was going to be unstructured. meanwhile, other columns can join and have decent constraints and indexes.
IMHO it's because so many people take decisions in rush. e.g. let's not design database, put whatever data shape we came ip in alpha version and see where it goes. Sometimes people favor one particular technology because every other startup chose it.
To be quite honest today's software engineering sadly is mostly about addressing 'how complex can we go' rather than 'what problem are we trying to solve'.
I've read a lot more about "how dumb it is to use mongo over PG" than the opposite, I think the burden of proof is on the mongo-lovers these days (not that anyone has to prove anything to randos on the internet)
Why mongo is dumb has been written up about ad nauseam - from data modeling and quality issues, out of control costs, etc. It's been a known toxic dumpsterfire for well over a decade...
Good spot - this is wrong. It should've been 4 x 3.84 TB NVMe SSD RAID 5. My colleague set this bit up so I'm not entirely up to speed on the terminology.
> The more keen eyed among you will have noticed the huge cost associated with data transfer over the internet - its as much as the servers! We're building Prosopo to be resilient to outages, such as the recent massive AWS outage, so we use many different cloud providers.
I mean, you're connecting to your primary database potentially on another continent? I imagine your costs will be high, but even worse, your performance will be abysmal.
> When you migrate to a self-hosted solution, you're taking on more responsibility for managing your database. You need to make sure it is secure, backed up, monitored, and can be recreated in case of failure or the need for extra servers arises.
> ...for a small amount of pain you can save a lot of money!
I wouldn't call any of that "a small amount of pain." To save $3,000/month you've now required yourself to become experts in a domain that maybe is out of your depth. So whatever cost saved is now tech debt and potentially having to hire someone else to manage your homemade solution for you.
However, I self-host, and applaud other self-hosters. But sometimes it really has to make business sense for your team.
> I mean, you're connecting to your primary database potentially on another continent?
Atlas AWS was actually setup in Ireland. The data transfer costs were coming from extracting data for ML modelling. We don't get charged for extracting data with the new contract.
> experts in a domain that maybe is out of your depth
We're in the bot detection space so we need to be able to run our own infra in order to inspect connections for patterns of abuse. We've built up a fair amount of knowledge because of this and we're lucky enough to have a guy in our team who just understands everything related to computers. He's also pretty good at disseminating information.
To be fair, a single server is way more reliable than cloud clusters.
Just look at the most recent many hour long Azure downtime where Microsoft could not even get microsoft.com back. With that much downtime you could physically move drives between servers multiple times each year, and still have less downtime. Servers are very reliable, cloud software is not.
I'm not saying people should use a single server if they can avoid it, but using a single cloud provider is just as bad. "We moved to the cloud, with managed services and redundancy, nothing has gone wrong...today"
Lol yep that could've been the headline. We plan to add replica servers at some point. This DB is not critical to our product hence the relaxed interim setup.
As in so many of these stories, what gets glossed over is just how much complexity there is in setting up your own server securely.
You set up your server. Harden it. Follow all the best practices for your firewall with ufw. Then you run a Docker container. Accidentally, or simply because you don’t know any better, you bind it to 0.0.0.0 by doing 5432:5432. Oops. Docker just walked right past your firewall rules, ignored ufw, and now port 5432 is exposed with default Postgres credentials. Congratulations. Say hello to Kinsing.
And this is just one of many possible scenarios like that. I’m not trying to spread FUD, but this really needs to be stressed much more clearly.
EDIT. as always - thank you HN for downvoting instead of actually addressing the argument.
There are also an enormous number of ways to build insecure apps on AWS. I think the difficulty of setting up your own server is massively overblown. And that should be unsurprising given that there are so many companies that benefit from developers thinking it's too hard.
UFW doesn't add much overhead given the implementation in Linux is already in place, it's mostly just a convenient front-end. That said, you also need to be concerned with internal/peer threats as well as external ones...
Clearly defining your boundaries is important for both internal and external vectors of attack.
it's getting hard to ignore Hetzner (as a Linode user).
Thing is, Linode was great 10-15 years ago, then enshittification ensued (starting with Akamai buying them).
So what does enshittification for Hetzner look like? I've already got migration scripts pointed at their servers but can't wait for the eventual letdown.
IMO, virtual servers and dedicated server hosting is really commoditized at this point. So you have a lot of options... assuming you have appropriate orchestration and management scripted out, with good backup procedures in place, you should be able to shift to any other provider relatively easily.
The pain points are when you're also intwined with specific implementations for services from a given provider... Sure, you can shift from PostgreSQL on a hosted provider to another without much pain... but say SQS to Azure Simple Queues or Service Bus is a lot more involved. And that is just one example.
The is a large reason to keep your services to those with self-hosted options and/or self-hosting from the start... that said, I'm happy to outsource things that are easier to (re) integrate or replace.
"I cut my healthcare costs by 90% by canceling insurance and doctor visits."
In all seriousness, this is a recurring pattern on HN and it sends the wrong message. It's almost as bad as vibecoding a paid service and losing private customer data.
There was a thread here awhile ago, 'How We Saved $500,000 Per Year by Rolling Our Own “S3' [1]. Then they promptly got hacked. [2]
Even after reading the source, it doesn’t seem like they were hacked? Or if they were, they were not accused of such.
I do think hand rolling your own thing is fraught. But it is very confusing to equate one mother’s complaint to “they have been hacked”.
PS: The people who made their own s3 rans a baby monitor company. News article is about a mother reporting hearing a weird voice from the baby monitour.
> Here's how we managed to cut our costs by 90%
You could cut your MongoDB costs by 100% by not using it ;)
> without sacrificing performance or reliability.
You're using a single server in a single datacenter. MongoDB Atlas is deployed to VMs on 2-3 AZs. You don't have close to the same reliability. (I'm also curious why their M40 instance costs $1000, when the Pricing Calculator (https://www.mongodb.com/pricing) says M40 is $760/month? Was it the extra storage?)
> We're building Prosopo to be resilient to outages, such as the recent massive AWS outage, so we use many different cloud providers
This means you're going to have multiple outages, AND incur more cross-internet costs. How does going to Hetzner make you more resilient to outages? You have one server in one datacenter. Intelligent, robust design at one provider (like AWS) is way more resilient, and intra-zone transfer is cheaper than going out to the cloud ($0.02/GB vs $0.08/GB). You do not have a centralized or single point of failure design with AWS. They're not dummies; plenty of their services are operated independently per region. But they do expect you to use their infrastructure intelligently to avoid creating a single point of failure. (For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.)
I get it; these "we cut bare costs by moving away from the cloud" posts are catnip for HN. But they usually don't make sense. There's only a few circumstances where you really have to transfer out a lot of traffic, or need very large storage, where cloud pricing is just too much of a premium. The whole point of using the cloud is to use it as a competitive advantage. Giving yourself an extra role (sysadmin) in addition to your day job (developer, data scientist, etc) and more maintenance tasks (installing, upgrading, patching, troubleshooting, getting on-call, etc) with lower reliability and fewer services, isn't an advantage.
> Intelligent, robust design at one provider (like AWS) is way more resilient, and intra-zone transfer is cheaper than going out to the cloud ($0.02/GB vs $0.08/GB).
If traffic cost is relevant (which it is for a lot of use cases), Hetzner's price of $1.20/TB ($0.0012 / GB) for internet traffic [1] is an order of magnitude less than what AWS charges between AWS locations in the same metro. If you host only at providers with reasonable bandwidth charges, most likely all of your bandwidth will be billed at less than what AWS charges for inter-zone traffic. That's obscene. As far as I can tell, clouds are balancing their budgets on the back of traffic charges, but nothing else feels under cost either.
> For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.
This doesn't always work out. During the GCP outage, my service was running fine, but other similar services were having trouble, so we attracted more usage, which we would have scaled up for, except that the GCP outage prevented that. Cloud makes it very expensive to run scaled beyond current needs and promises that scale out will be available to do just in time...
[1] https://docs.hetzner.com/robot/general/traffic/
At some point our cross-AZ traffic for Elasticsearch replication at AWS was more expensive than what we'd pay to host the whole cluster replicated across multiple baremetal Hetzner servers.
Could we have done better with more sensible configs? Was it silly to cluster ES cross-AZ? Maybe. Point is that if you don't police every single detail of your platform at AWS/GCP and the like, their made-up charges will bleed your startup and grease their stock price.
turns out cross AZ is recommended for ES. perhaps our data team was rewritting the indices too often. but it was an internal requirement. so I think the data schema could have been more efficient to append deltas instead of reindexing all. but none of that will inflate your bill significantly at Hetzner. of course it will at AWS as that's how they incentivise clients to optimize and reduce their impact. and that's how you cut your runway by 3-6 months in compute heavy startups
I think you underestimate how reduction in complexity can increase reliability. becoming a sysadmin for a single inexpensive server instance carries almost the same operational burden as operating an unavoidably very complicated cluster using a cloud provider.
not if you are using Atlas. Its as simple as it can be with way more functionality you can ever admin in yourself.
As others have said unless the scale of the data is the issue, if your switching because of cost, perhaps you should be going back to your business model instead.
> we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.
I think it was just luck of the draw that the failure happened in this way and not some other way. Even if APIs falling over but EC2 instances remaining up is a slightly more likely failure mode, it means you can't run autoscaling, can't depend on spot instances which in an outage you can lose and can't replace.
> you're going to have multiple outages us: 0, aws: 1. Looking good so far ;)
> AND incur more cross-internet costs hetzner have no bandwidth traffic limit (only speed) on the machine, we can go nuts.
I understand you point wrt the cloud, but I spend as much time debugging/building a cloud deployment (atlas :eyes: ) as I do a self-hosted solution. Aws gives you all the tools to build a super reliable data store, but many people just chuck something on us-east-1 and go. There's you single point of failure.
Given we're constructing a many-node decentralised system, self-hosted actually makes more sense for us because we've already had to become familiar enough to create a many-node system for our primary product.
When/if we have a situation where we need high data availability I would strongly consider the cloud, but in the situations where you can deal with a bit of downtime you're massively saving over cloud offerings.
We'll post a 6-month and 1-year follow-up to update the scoreboard above
> many people just chuck something on us-east-1 and go
Even dropping something on a single EC2 node in us-east-1 (or at Google Cloud) is going to be more reliable over time than a single dedicated machine elsewhere. This is because they run with a layer that will e.g. live migrate your running apps in case of hardware failures.
The failure modes of dedicated are quite different than those of the modern hyperscaler clouds.
It's not an apples-to-apples comparison, because EC2 and Google Cloud have ephemeral disk - persistent disk is an add-on, which is implemented with a complex and frequently changing distributed storage system
On the other hand, a Hetzner machine I just rented came with Linux software RAID enabled (md devices in the kernel)
---
I'm not aware of any comparisons, but I'd like to see see some
It's not straightforward, and it's not obvious the cloud is more reliable
The cloud introduces many other single points of failure, by virtue of being more complex
e.g. human administration failure, with the Unisuper incident
https://news.ycombinator.com/item?id=40366867
https://arstechnica.com/gadgets/2024/05/google-cloud-acciden... - “Unprecedented” Google Cloud event wipes out customer account and its backups
Of course, dedicated hardware could have a similar type of failure, but I think the simplicity means there is less variety in the errors.
e.g. A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable - Leslie Lamport
> by virtue of being more complex
I just wish there was a way to underscore this more and more. Complex systems fail in complex ways. Sadly, for many programmers, the thrill or ego boost that comes with solving/managing complex problems lets us believe complex is better than simple.
Thanks for sharing the story and committing to a 6-month and 1 year follow up. We will definitely be interested to hear further how it went over time.
In the mean time, I am curious where the time was spent debugging and building Atlas deployments? It certainly isn't the cheapest option, but it has been quite a '1 click' solution for us.
I’m curious about the resilience bit. Are you planning on some sort of active-active setup with mongo? I found it difficult on AWS to even do active-passive (i guess that was docdb), since programatically changing the primary write node instance was kind of a pain when failing over to a new region.
Going into any depth with mongo mostly taught me to just stick with postgres.
> You're using a single server in a single datacenter.
This is a common problem with “bare metal saved us $000/mo” articles. Bare metal is cheaper than cloud by any measure, but the comparisons given tend to be misleadingly exaggerated as they don't compare like-for-like in terms of redundancy and support, and after considering those factors it can be a much closer result (sometimes down as far as familiarity and personal preference being more significant).
Of course unless you are paying extra for multi-region redundancy things like the recent us-east-1 outage will kill you, and that single point of failure might not really matter if there are several others throughout your systems anyway, as is sometimes the case.
It doesn't have to be one server in a single datacenter, though. It adds some complexity, but you could have a backup server ready to go at a different cheap provider (Hetzner and OVH, for example) and still save a lot.
Premature optimization. Not every single service needs or require 5 nines.
It's true, but I'm woken up more frequently if there are fewer 9s, which is unpleasant. It's worth the extra cost to me.
and each additional nine increases complexity geometrically.
> You have one server in one datacenter.
It doesn't have to be only one server in one datacenter though.
It's more work, but you can have replicas ready to go at other Hetzner DCs (they offer bare metal at 3 locations in 2 different countries) or at other cheaper providers like OVH. Two or three $160 servers is still cheaper than what they're paying right now.
These types of posts make for excellent karma farming, but this one does present all the issues you've mentioned. Heck, Scaleway has managed Mongo for a bit more money and with redundancy and multi-AZ to boot. Were they trying to go as cheap as possible?
> You could cut your MongoDB costs by 100% by not using it ;)
I cut my Mongo DB costs by 100% by piping my data to /dev/null.
At least it's ACID compliant
https://github.com/dcramer/mangodb
Usually AWS is pretty good at hiding all the reliability and robustness that goes onto into making a customer's managed service. Customers are not made aware what it takes.
I love MongoDB's query language (JS/Node.js developer so the syntax fits my mental model well), but running a production replica set without spending tons of cash is a nightmare. Doubly so if you have any unoptimized queries (it's easy to trick yourself into thinking throwing more hardware at the problem will help). Lord help you if you use a hosted/managed service.
Just fixed a bug on my MongoDB instance last night that, due to a config error w/ self-signed certs (the hostname in the replica set config has to match the CN on the cert), that caused MongoDB to rocket to 400% CPU utilization (3x, 8GB, 4VCPU dedicated boxes on DO) due to a weird election loop in the replica set process. Fixing that and adding a few missing indexes brought it down to ~12% on average. Simple mistakes, sure, but the real-world cost of those mistakes is brutal.
The dump, restore and custom scripts to synchronize the new instance sound a bit odd. You could just add the instance as a secondary to your cluster and mongo itself handles synchronization. Then removing the old instances automatically promotes the new to primary.
OK guys, running on a single instance is REALLY a BAD IDEA for non-pet-projects. Really bad! Change it as fast as you can.
I love Hetzner for what they offer but you will run into huge outages pretty soon. At least you need two different network zones on Hetzner and three servers.
It's not hard to setup, but you need to do it.
I think you're being overly dramatic. In practice I've seen complexity (which HA setups often introduce) causing downtimes far more often than a service being hosted only on a single instance.
Yes, any time someone says "I'm going to make a thing more reliable by adding more things to it" I either want to buy them a copy of Normal Accidents or hit them over the head with mine.
You'll have planned downtime just for upgrading MongoDB version or rebooting the instance. I don't think that this is sth you'd want to have. Running MongoDB in a replica set is really easy and much easier than running postgres or MySQL in an HA setup.
No need for SREs. Just add 2 more Hetzner servers.
We have used Hetzner for 15+ years. There were some outages with the nastiest being the network ones. But they're usually not "dramatically bad" if you build with at least basic failover. With this we had seen less than 1 serious per 3 years. Most of the downtime is because of our own stupidity.
If you know what you're doing Hetzner is godsend, they give you hardware and several DCs and it's up to you what you can do. The money difference is massive.
There are so many applications the world is running on that only have one instance that is maybe backupped. Not everything has to be solved by 3 reliability engineers.
agree on single instance, but for hetzner, I run 100+ large bare metal servers in hetzner, have for at least 5 years and there’s only been one significant outage they had, we do spread across all their datacenter zones and replicate, so it’s all been manageable. It’s worth it for us, very worth it.
I experienced some cutthroat commercial behavior from MongoDB. It scared us enough to avoid Atlas, and ultimately move to Cosmos on Azure. Massive savings.
I moved to another employer that was using Atlas, and the bill rivaled AWS. Unfortunately it was too complex to untangle.
As much as I like MongoDB as a developer, the last thing I ever want to do is manage a deployment again.
I feel like some of these articles miss a few points, even in this one. The monthly cost of the MongoDB hosting was around $2k... that's less than a FT employee salary, and if it can spare you the cost of an employee, that's not a bad thing.
On the flip side, if you have employee talent that is already orchestrating Kubernetes across multiple clouds, then sure it makes sense to internalize services that would otherwise be external if it doesn't add too much work/overhead to your team(s).
In either case, I don't think the primary driver in this is cost at all. Because that 90% quoted reduction in hosting costs is balanced by the ongoing salary of the person or people who maintain those systems.
Atlas is plain robbery. I see companies paying 600K USD/month on a few clusters, mostly used for testing. The problem is they got locked into this, by doing a huge migration of their apps and switching to a different tech would easily take 2 to 5 years.
I‘m a big fan of owning the stack but why not spend the money on redundancy? At least a couple of machines in a different data center at Hetzner or another provider (OVH, Scaleway, Vultr, …) can easily fit your budget.
We will be adding additional db servers and running our own replica set eventually. We're just not there yet. Thanks for reading!
But then you’ll be tripling your costs.
Business people are weird about numbers. You should have claimed 70% even if the replicas do nothing and made them work later on. This is highly likely to bite you on the ass.
+1 this is so true. You've lost, you've already publicly praised yourself that you saved 90%. They won't like the idea of tripling the costs, even if it is still below the previous costs.
Always consider if 12 hours of lost revenue is worth the savings. Recently hetzner has been flakey with minimum or no response for support or even status updates that anything was wrong. My favorite was them blaming an issue on my side just to have a maintenance status update the day after about congestion.
Atlas wasn't giving us any support for $3K per month. Hetzner at least have some channel to contact them, which is an improvement. That said, if their uptime is rubbish them we'll probably migrate again. Moving back to Atlas is not an option as we were getting hammered by the data transfer costs and this was only going to increase due to our architecture. Thanks for reading!
500GB isn't a lot of data, and $3K/month seems like an extortion for that little data.
Having said that, MongoDB pricing page promises 99.995% uptime, which is outstanding, and would probably be hard to beat that doing it oneself, even after adding redundancy. But maybe you don't need that much uptime for your particular use case.
That's all fine and such, but i suppose the SLAs aren't covering your revenue loss.
In fact after looking at https://www.mongodb.com/legal/sla/atlas/data-federation#:~:t... it makes me wonder how much worth the SLA is. 10% Service Credit after all the limitations?
Atlas can take their 10% Service Credit, i wouldn't care. Save the money and chose a stable provider.
Its more like 700GB now on the new server and we were about to have to migrate to a higher tier on Atlas.
> maybe you don't need that much uptime for your particular use case.
Correct. Thanks for reading!
Yep, we just migrated to Atlas, and the disk size limitation of the lower instance tiers pushed us to do a round of data cleaning before the migration.
Also, we noticed that after migration, the databases that were occupying ~600GB of disk in our (very old) on premise deployment, were around 1TB big on Atlas. After talking with support for a while we found that they were using Snappy compression with a relatively low compression level and we couldn't change that by ourselves. After requesting it through support, we changed to zstd compression, rebuilt all the storage, and a day or two later our storage was under 500GB.
And backup pricing is super opaque. It doesn't show concrete pricing on the docs, just ranges. And depending on the cloud you deployed, snapshots are priced differently so you can't just multiply you storage by the number of the snapshots, and they aren't transparent about the real size of the snapshots.
All the storage stuff is messy and expensive...
> Having said that, MongoDB pricing page promises 99.995% uptime
Or.. what? That's the important part
OVH is allegedly pretty good. I host all my personal stuff on Hetzner right now so I can't speak to it personally.
We also use OVH and have so far not had any downtime in about 6 months.
My Hetzner instances all have higher reliability and uptime than AWS deployments. For years now.
That was an interesting surprise.
If I understand correctly, the author's company provides a CAPTCHA alternative, which presumably means that if their service goes down, all of their customer's logins, forms, etc. either become inoperable or don't provide the security the company is promising by using their service.
This makes me want to use the company's service less because now I know they can't survive an outage in a consistent and resilient way.
Using hetzner since 5 years never had issues and only 1 downtime in one data center.
Note, if you're looking for MongoDB Enterprise features you can find many of them with Percona Server for MongoDB, which you can use for free the same way as MongoDB Community
Nice, thanks for the tip!
MongoDB Atlas was around 500% more expensive than in-house every time I evaluated it (at almost every scale they offer as well).
They also leaned too heavily on sharding as a universal solution to scaling as opposed to leveraging the minimal cost of terabytes of RAM. The p99 latency increase, risk of major re-sharding downtime, increased restore times, and increased operational complexity weren't worth it for ~1 TB datasets.
MongoDB Atlas is so overpriced that you can probably save already 90% by moving to AWS.
Most of the cost in their bill wasn't from MongoDB, it was cost passed on from AWS
I don't remember the numbers (90% is probably a bit exaggerated) but our savings of going from Atlas to MongoDB Community on EC2 several years ago were big.
In addition to direct costs, Atlas had also expensive limitations. For example we often spin up clone databases from a snapshot which have lower performance and no durability requirements, so a smaller non-replicated server suffices, but Atlas required those to be sized like the replicated high performance production cluster.
Was it? Assuming an M40 cluster consists of 3 m6g.xlarge machines, that's $0.46/hr on-demand compared to Atlas's $1.04/hr for the compute. Savings plans or reserved instances reduce that cost further.
Highly doubt that. MongoDB has 5000 well paid employees and is not a big loss making enterprise. If most of the cost was pass through to AWS, they’d not be able to do that. Their quarterly revenue is $500M+ but also spend $200M in sales and marketing and $180M in R&D. (All based on their filings)
Just host on a server in your basement. Put another instance in someone else's basement. I'm only half joking - track the downtime.
You probably want to store the backup somewhere else, ie. not Hetzner.
They are known to just cancel accounts and cut access.
Any proof of that? I am a Hetzner customer and had never heard of this before. Would be good to know what I got into.
A few years back I launched an io game and used hetzner as my backend. an hour into launch day they null routed my account because their anti-abuse system thought my sudden surge in websocket connections was an attack (unclear if they thought it was inbound or outbound doing the attacking).
I had paid for advertising on a few game curation sites plus youtubers and streamers. Lovely failure all thanks to Hetzner. Took 3 days and numerous emails with the most arrogant Germans you’ve ever met before my account was unlocked.
I switched to OVH and while they’re not without their own faults (reliability is a big one), it’s been a far better experience.
OVH also null routes, it has happened to me.
It seems like you have to go to one of the big boys like hurricane electric where you are allowed to use the bandwidth you paid for without someone sticking their fingers in it.
There are a lot of such stories if you go digging around HN and reddit threads. Haven't seen a lot of these stories in a while, so it may be happening less now.
Good shout. I think we'll also run replicas on other providers. We've got some complex geo-fencing stuff to do with regards to data hence why we're just on Hetzner right now.
How long does mongodump take on that database? My experience was that incremental filesystem/blockdevice snapshots were the only realistic way of backing up (non sharded) mongodb. In our case EBS snapshots, but I think you can achieve the same using LVM or filesystems like XFS and ZFS.
It takes ~21hrs to dump the entire db (~500gb), but I'm limited by my internet speed (100mbps, seeing 50-100mbps during dump). Interestingly, the throughput is faster than doing a db dump from atlas which used to max around 30mbps
I’m starting to worry about this Hetzner trend. It can end up to get the price skyrocketing.
There's other providers (OVH, etc) so I'm sure the price will remain competitive
prices just dropped. :)
Hopefully not. Their console is pretty bad so I reckon that will put a lot of people off.
The cloud console is pretty good though? Even does live sync!
The old one for dedicated servers (robot) is horribly outdated though.
Ah right, we're on robot so I've not seen the cloud one. Robot is old! :)
The new console is completely fine.
EC2 is sort of a ceiling price.
We’re just going to end up with everyone moving from Amazon to Hetzner and the same issue will remain. High prices, lockin, etc will appear.
We need an American “get off American big tech” movement.
Differentiate people! Reading “we moved from X to Y” does not mean everyone move from X to Y, it means start considering the Y values and research other Y’s around you.
We also use OVH, Contabo, Hostwinds... Architect so you can be multi-provider and reduce internet centralisation!
Nice, if you write an article about it, try to leave the focus off of a single hosting provider. Encouraging the differentiation is important too (next time! I’m not dogging the movement or your efforts in this article, I love to see reduced reliance of Amazon in general).
> We need an American “get off American big tech” movement.
As a non-American, I use Hetzner precisely to have my projects not hosted anywhere near the US.
Hetzner is German?
> Hetzner is German?
Yes. Hetzner is a German company from Gunzenhausen.
https://en.wikipedia.org/wiki/Hetzner
Having run a small mongo database and having it hosted in 3 different places at one point. The last point was atlas, yes it was expensive but we got replication, we could have an analytical node, we even had data residency. If I remember correctly you can have your replicas in different providers at the same time.
One of the biggest issues was cost, but we were treated like first class citizens, the support was good, we saw constant updates and features. Using atlas search was fantastic because we didn't have to replicate the data to another resource for quick searching.
Before atlas we were on Compose.io and well mongo there just withered and we were plagued by performance issues
I recently did a total cost of ownership analysis for moving off AWS to Hetzner: https://beuke.org/hetzner-aws/
Why in the world do people choose Mongo over Postgres? I'm legit curious. Is it inexperience? Javascript developers who don't know backend or proper data modeling (or about jsonb)? Is this type of decision coming down from non-technical management? Are VCs telling their portfolio companies what to use so they have something to burn their funding on? It's just really confounding, especially when there's even mongo-api compatible Postgres solutions now. Perhaps I'm just not webscale and too cranky.
Personally I've found it faster to build using mongo cause you don't need to worry about schemas. You get 32mb per document and you can work out your downstream processing later, e.g. cleanup and serve to postgres, file, wherever. This data is a big data dump that's feeding ML models so relational stuff is not that important.
I used to build personal projects like this, but after Postgres got JSONB support I haven't found any reason to not just start with Postgres. There's usually a couple of tables/columns you want a proper schema for, and having it all in Postgres to begin with makes it much easier to migrate the schemaless JSONB blobs later on.
You definitely do have to worry about a schema. Except it’s ill defined and scattered across your business logic.
It depends on your use case, and RDBMS isn't the best option for all needs. Mongo's approach is pretty useable. That said, there are alternatives, you can get very similar characteristics, though a more painful devex out of say CockroachDB with (key:string, value: JSONB) tables.
The only thing I really don't care for is managing Mongo... as a developer, using it is pretty joyous assuming you can get into the query mindset of how to use it.
Also, if you're considering Mongo, you might also want to consider looking at Cassandra/ScyllaDB or CockroachDB as alternatives that might be a better fit for your needs that are IMO, easier to administer.
I'll repeat it again: you don't always want a relational database. Sometimes you need a document-oriented one. It matches quite a lot of use cases, e.g. when there aren't really interesting relations, or when the structures are very deep. That can be really annoying in SQL.
> when there's even mongo-api compatible Postgres solutions
With their own drawbacks.
I'd probably use a jsonfield in postgres for data that i knew was going to be unstructured. meanwhile, other columns can join and have decent constraints and indexes.
IMHO it's because so many people take decisions in rush. e.g. let's not design database, put whatever data shape we came ip in alpha version and see where it goes. Sometimes people favor one particular technology because every other startup chose it.
To be quite honest today's software engineering sadly is mostly about addressing 'how complex can we go' rather than 'what problem are we trying to solve'.
maybe instead of communicating how dumb you think people are for choosing mongo, communicate why you think it’s so dumb
I've read a lot more about "how dumb it is to use mongo over PG" than the opposite, I think the burden of proof is on the mongo-lovers these days (not that anyone has to prove anything to randos on the internet)
Why mongo is dumb has been written up about ad nauseam - from data modeling and quality issues, out of control costs, etc. It's been a known toxic dumpsterfire for well over a decade...
hetzner routinely refuses to accept you as customer so while u can cut costs its a privilege.
Are you sure you went with RAID1 with 4x disks instead of RAID10?
Good spot - this is wrong. It should've been 4 x 3.84 TB NVMe SSD RAID 5. My colleague set this bit up so I'm not entirely up to speed on the terminology.
fdufknitehjnuhgiu j hwebbjhnjirfguirgbuhjnoi hkjbgkjbtgjkbn gntrjbk kgdjkjiugiutughjhnty,tjdchu51iibonu45t90yhbihjbjjbgjgjgbjkgbitinhujgnjguit5irguibhnj b rr gngkrjrjgjrdufhvrbtjgfhui nnH fjkghjnjrtkg
> The more keen eyed among you will have noticed the huge cost associated with data transfer over the internet - its as much as the servers! We're building Prosopo to be resilient to outages, such as the recent massive AWS outage, so we use many different cloud providers.
I mean, you're connecting to your primary database potentially on another continent? I imagine your costs will be high, but even worse, your performance will be abysmal.
> When you migrate to a self-hosted solution, you're taking on more responsibility for managing your database. You need to make sure it is secure, backed up, monitored, and can be recreated in case of failure or the need for extra servers arises.
> ...for a small amount of pain you can save a lot of money!
I wouldn't call any of that "a small amount of pain." To save $3,000/month you've now required yourself to become experts in a domain that maybe is out of your depth. So whatever cost saved is now tech debt and potentially having to hire someone else to manage your homemade solution for you.
However, I self-host, and applaud other self-hosters. But sometimes it really has to make business sense for your team.
> I mean, you're connecting to your primary database potentially on another continent?
Atlas AWS was actually setup in Ireland. The data transfer costs were coming from extracting data for ML modelling. We don't get charged for extracting data with the new contract.
> experts in a domain that maybe is out of your depth
We're in the bot detection space so we need to be able to run our own infra in order to inspect connections for patterns of abuse. We've built up a fair amount of knowledge because of this and we're lucky enough to have a guy in our team who just understands everything related to computers. He's also pretty good at disseminating information.
Thanks for reading!
aww shucks ;)
I hope you don't mind if I hijack this post to ask:
Is there a provider similar to Hetzner but US based?
I’ve never heard of or used them, but this was linked in a previous Hetzner thread: https://ioflood.com/
"We replaced a cluster of virtualized servers with a single bare metal server. Nothing has gone wrong, yet."
There are many cases when some downtime is perfectly ok. Or, at least, worth the savings
They saved a little under 3k and were motivated by the aws outage
To be fair, a single server is way more reliable than cloud clusters.
Just look at the most recent many hour long Azure downtime where Microsoft could not even get microsoft.com back. With that much downtime you could physically move drives between servers multiple times each year, and still have less downtime. Servers are very reliable, cloud software is not.
I'm not saying people should use a single server if they can avoid it, but using a single cloud provider is just as bad. "We moved to the cloud, with managed services and redundancy, nothing has gone wrong...today"
Lol yep that could've been the headline. We plan to add replica servers at some point. This DB is not critical to our product hence the relaxed interim setup.
As in so many of these stories, what gets glossed over is just how much complexity there is in setting up your own server securely.
You set up your server. Harden it. Follow all the best practices for your firewall with ufw. Then you run a Docker container. Accidentally, or simply because you don’t know any better, you bind it to 0.0.0.0 by doing 5432:5432. Oops. Docker just walked right past your firewall rules, ignored ufw, and now port 5432 is exposed with default Postgres credentials. Congratulations. Say hello to Kinsing.
And this is just one of many possible scenarios like that. I’m not trying to spread FUD, but this really needs to be stressed much more clearly.
EDIT. as always - thank you HN for downvoting instead of actually addressing the argument.
There are also an enormous number of ways to build insecure apps on AWS. I think the difficulty of setting up your own server is massively overblown. And that should be unsurprising given that there are so many companies that benefit from developers thinking it's too hard.
I don't see the point of using ufw at all as Hetzner provides an external firewall.
UFW doesn't add much overhead given the implementation in Linux is already in place, it's mostly just a convenient front-end. That said, you also need to be concerned with internal/peer threats as well as external ones...
Clearly defining your boundaries is important for both internal and external vectors of attack.
If you use a dedicated hetzner machine you only get a stateless firewall. That would be one reason.
it's getting hard to ignore Hetzner (as a Linode user).
Thing is, Linode was great 10-15 years ago, then enshittification ensued (starting with Akamai buying them).
So what does enshittification for Hetzner look like? I've already got migration scripts pointed at their servers but can't wait for the eventual letdown.
IMO, virtual servers and dedicated server hosting is really commoditized at this point. So you have a lot of options... assuming you have appropriate orchestration and management scripted out, with good backup procedures in place, you should be able to shift to any other provider relatively easily.
The pain points are when you're also intwined with specific implementations for services from a given provider... Sure, you can shift from PostgreSQL on a hosted provider to another without much pain... but say SQS to Azure Simple Queues or Service Bus is a lot more involved. And that is just one example.
The is a large reason to keep your services to those with self-hosted options and/or self-hosting from the start... that said, I'm happy to outsource things that are easier to (re) integrate or replace.
"I cut my healthcare costs by 90% by canceling insurance and doctor visits."
In all seriousness, this is a recurring pattern on HN and it sends the wrong message. It's almost as bad as vibecoding a paid service and losing private customer data.
There was a thread here awhile ago, 'How We Saved $500,000 Per Year by Rolling Our Own “S3' [1]. Then they promptly got hacked. [2]
[1] https://engineering.nanit.com/how-we-saved-500-000-per-year-...
[2] https://www.cbsnews.com/colorado/news/colorado-mom-stranger-...
Even after reading the source, it doesn’t seem like they were hacked? Or if they were, they were not accused of such.
I do think hand rolling your own thing is fraught. But it is very confusing to equate one mother’s complaint to “they have been hacked”.
PS: The people who made their own s3 rans a baby monitor company. News article is about a mother reporting hearing a weird voice from the baby monitour.
Multiple reports on reddit suggest people making this baby cam do not understand security.
https://www.reddit.com/r/NewParents/comments/1ocgmoi/nanit_c... https://www.reddit.com/r/Nanit/comments/1ffc051/nanit_hacked... https://www.reddit.com/r/Nanit/comments/1dyaph6/heard_a_voic...