I did this when I was ~22 messing with infra for the first time. A $300 bill in two days when I had $2000 in the bank really stung. I love AWS for many things, but I really wish they made the cost calculations transparent for beginners.
This happens so often that the S3 VPC endpoint should be setup by default when your VPC is created. AWS engineers on here - make this happen.
Also, consider using fck-nat (https://fck-nat.dev/v1.3.0/) instead of NAT gateways unless you have a compelling reason to do otherwise, because you will save on per-Gb traffic charges.
(Or, just run your own Debian nano instance that does the masquerading for you, which every old-school Linuxer should be able to do in their sleep.)
Yes, this. You lock it into Terraform or some equivalent.
And ok, this is a mistake you will probably only make once - I know, because I too have made it on a much smaller scale, and thankfully in a cost-insensitive customer's account - but surely if you're an infrastructure provider you want to try to ensure that you are vigilantly removing footguns.
Especially true now with Claude generating decent terraform code. I was shocked how good it is at knowing AWS gotchas. It also debug connectivity issues almost automagically. While I hate how it writes code I love how it writes terraform.
I personally prefer to just memorize the data and recite it really quickly on-demand.
Only half-joking. When something grossly underperforms, I do often legitimately just pull up calc.exe and compare the throughput to the number of employees we have × 8 kbit/sec [0], see who would win. It is uniquely depressing yet entertaining to see this outperform some applications.
[0] spherical cow type back of the envelope estimate, don't take it too seriously; assumes a very fast 200 wpm speech, 5 bytes per word, and everyone being able to independently progress
The reason to not include the endpoint by default is because VPCs should be secure by default. Everything is denied and unless you explicitly configure access to the Internet, it's unreachable. An attacker who manages to compromise a system in that VPC now has a means of data exfiltration in an otherwise air gapped set up.
It's annoying because this is by far the more uncommon case for a VPC, but I think it's the right way to structure, permissions and access in general. S3, the actual service, went the other way on this and has desperately been trying to reel it back for years.
There’s zero reason why AWS can’t pop up a warning if it detects this behavior though. It should clearly explain the implications to the end user. I mean EKS has all sorts of these warning flags it pops up on cluster health there’s really no reason why they can’t do the same here.
Right, I can appreciate that argument - but then the right thing to do is to block S3 access from AWS VPCs until you have explicitly confirmed that you want to pay the big $$$$ to do so, or turn on the VPC endpoint.
A parallel to this is how SES handles permission to send emails. There are checks and hoops to jump through to ensure you can't send out spam. But somehow, letting DevOps folk shoot themselves in the foot (credit card) is ok.
What has been done is the monetary equivalent of "fail unsafe" => "succeed expensively"
AWS is not going to enable S3 endpoints by default, and most of the thread is downvoting the correct explanations like thinking in terms of a small hobby VPC, not the architectures AWS actually has to support.
Why it should not be done:
1. It mutates routing.
Gateway Endpoints inject prefix-list routes into selected route tables. Many VPCs have dozens of RTs for segmentation, TGW attachments, inspection subnets, EKS-managed RTs, shared services, etc. Auto-editing them risks breaking zero-trust boundaries and traffic-inspection paths.
2. It breaks IAM / S3 policies.
Enterprises commonly rely on aws:sourceVpce, aws:SourceIp, Private Access Points, SCP conditions, and restrictive bucket policies. Auto-creating a VPCE would silently bypass or invalidate these controls.
3. It bypasses security boundaries.
A Gateway Endpoint forces S3 traffic to bypass NAT, firewalls, IDS/IPS, egress proxies, VPC Lattice policies, and other mandatory inspection layers. This is a hard violation for regulated workloads.
4. Many VPCs must not access S3 at all.
Air-gapped, regulated, OEM, partner-isolated, and inspection-only VPCs intentionally block S3. Auto-adding an endpoint would break designed isolation.
5. Private DNS changes behavior.
With Private DNS enabled, S3 hostname resolution is overridden to use the VPCE instead of the public S3 endpoint. This can break debugging assumptions, routing analysis, and certain cross-account access patterns.
6. AWS does not assume intent.
The VPC model is intentionally minimal. AWS does not auto-create IGWs, NATs, Interface Endpoints, or egress paths. Defaults must never rewrite user security boundaries.
These sort of things show up about once a day between the three big cloud subreddit. Often with larger amounts
And it’s always the same - clouds refuse to provide anything more than alerts (that are delayed) and your only option is prayer and begging for mercy.
Followed by people claiming with absolute certainty that it’s literally technically impossible to provide hard capped accounts to tinkerers despite there being accounts like that in existence already (some azure accounts are hardcapped by amount but ofc that’s not loudly advertised).
This might be speaking the obvious, but I think that the lack of half-decent cost controls is not intentionally malicious. There is no mustache-twirling villain who has a great idea on how to !@#$ people out of their money. I think it's the play between incompetence and having absolutely no incentive to do anything about it (which is still a form of malice).
I've used AWS for about 10 years and am by no means an expert, but I've seen all kinds of ugly cracks and discontinuities in design and operation among the services. AWS has felt like a handful of very good ideas, designed, built, and maintained by completely separate teams, littered by a whole ton of "I need my promotion to VP" bad ideas that build on top of the good ones in increasingly hacky ways.
And in any sufficiently large tech orgnization, there won't be anyone at a level of power who can rattle cages about a problem like this, who will want to be the one to do actually it. No "VP of Such and Such" will spend their political capital stressing how critical it is that they fix the thing that will make a whole bunch of KPIs go in the wrong direction. They're probably spending it on shipping another hacked-together service with Web2.0-- er. IOT-- er. Blockchai-- er. Crypto-- er. AI before promotion season.
> There is no mustache-twirling villain who has a great idea on how to !@#$ people out of their money.
I dunno, Aurora’s pricing structure feels an awful lot like that. “What if we made people pay for storage and I/O? And we made estimating I/O practically impossible?”
> I think that the lack of half-decent cost controls is not intentionally malicious
It wasn't when the service was first created. What's intentionally malicious is not fixing it for years.
Somehow AI companies got this right form the get go. Money up front, no money, no tokens.
It's easy to guess why. Unlike hosting infra bs, inference is a hard cost for them. If they don't get paid, they lose (more) money. And sending stuff to collections is expensive and bad press.
> Somehow AI companies got this right form the get go. Money up front, no money, no tokens.
That’s not a completely accurate characterization of what’s been happening. AI coding agent startups like Cursor and Windsurf started by attracting developers with free or deeply discounted tokens, then adjusted the pricing as they figure out how to be profitable. This happened with Kiro too[1] and is happening now with Google’s Antigravity. There’s been plenty of ink spilled on HN about this practice.
[1] disclaimer: I work for AWS, opinions are my own
I think you’re talking about a different thing? The bad practice from AWS et al is that you post-pay for your usage, so usage can be any amount. With all the AI things I’ve seen, either:
- you prepay a fixed amount (“$200/mo for ChatGPT Max”)
- you deposit money upfront into a wallet, if the wallet runs out of cash then you can’t generate any more tokens
- it’s free!
I haven’t seen any of the major model providers have a system where you use as many tokens as you want and then they bill you, like AWS has.
AWS isn't for tinkerers and doesn't have guard rails for them, that's it. Anybody can use it but it's not designed for you to spend $12 per month. They DO have cost anomaly monitoring, they give you data so you can set up your own alerts for usage or data, but it's not a primary feature because they're picking their customers and it isn't the bottom of the market hobbyist. There are plenty of other services looking for that segment.
I have budgets set up and alerts through a separate alerting service that pings me if my estimates go above what I've set for a month. But it wouldn't fix a short term mistake; I don't need it to.
AWS just released flat-rate pricing plans with no overages yesterday. You opt into a $0, $15, or $200/mo plan and at the end of the month your bill is still $0, $15, or $200.
It solves the problem of unexpected requests or data transfer increasing your bill across several services.
AWS just yesterday launched flat rate pricing for their CDN (including a flat rate allowance for bandwidth and S3 storage), including a guaranteed $0 tier.
I agree that it’s likely very technically difficult to find the right balance between capping costs and not breaking things, but this shows that it’s definitely possible, and hopefully this signals that AWS is interested in doing this in other services too.
AWS would much rather let you accidentally overspend and then forgive it when you complain than see stories about critical infrastructure getting shut off or failing in unexpected ways due to a miscommunication in billing.
I think it’s disingenuous to claim that AWS only offers delayed alerts and half-decent cost controls. Granted, these features were not there in the beginning, but for years now AWS, in addition to the better known stuff like strategic limits on auto scaling, allows subscribing to price threshold triggers via SNS and perform automatic actions, which could be anything including scaling down or stopping services completely if the cost skyrockets.
The problem with hard caps is that there's no way to retroactively fix "our site went down". As much as engineers are loathe to actually reach out to a cloud provider, are there any anecdotes of AWS playing hardball and collecting a 10k debt for network traffic?
Conversely the first time someone hits an edge case in billing limits and their site goes down, losing 10k worth of possible customer transactions there's no way to unring that bell.
The second constituency are also, you know, the customers with real cloud budgets. I don't blame AWS for not building a feature that could (a) negatively impact real, paying customers (b) is primarily targeted at people who by definition don't want to pay a lot of money.
I'd much rather lose 10k in customers that might potentially come another day than 10k in Amazon bill. Amazon bill feels like more unringable.
But hey, let's say you have different priorities than me. Then why not bot? Why not let me set the hard cap? Why Amazon insists on being able to bill me on more than my business is worth if I make a mistake?
It's not that it's technically impossible. The very simple problem is that there is no way of providing hard spend caps without giving you the opportunity to bring down your whole production environment when the cap is met. No cloud provides wants to give their customers that much rope to hang themselves with. You just know too many customers will do it wrong or will forget to update the cap or will not coordinate internally, and things will stop working and take forever to fix.
It's easier to waive cost overages than deal with any of that.
>The very simple problem is that there is no way of providing hard spend caps without giving you the opportunity to bring down your whole production environment when the cap is met.
And why is that a problem? And how different is that from "forgetting" to pay your bill and having your production environment brought down?
Why does this always get asserted? It's trivial to do (reserve the cost when you allocate a resource [0]), and takes 2 minutes of thinking about the problem to see an answer if you're actually trying to find one instead of trying to find why you can't.
Data transfer can be pulled into the same model by having an alternate internet gateway model where you pay for some amount of unmetered bandwidth instead of per byte transfer, as other providers already do.
Reserving the cost until the end of the billing cycle is super unfriendly for spiky traffic and spiky resource usage. And yet one of the main selling points of the cloud is elasticity of resources. If your load is fixed, you wouldn’t even use the cloud after a five minute cost comparison. So your solution doesn’t work for the intended customers of the cloud.
It works just fine. No reason you couldn't adjust your billing cap on the fly. I work in a medium size org that's part of a large one, and we have to funnel any significant resource requests (e.g. for more EKS nodes) through our SRE teams anyway to approve.
Actual spikey traffic that you can't plan for or react to is something I've never heard of, and believe is a marketing myth. If you find yourself actually trying to suddenly add a lot of capacity, you also learn that the elasticity itself is a myth; the provisioning attempt will fail. Or e.g. lambda will hit its scaling rate limit way before a single minimally-sized fargate container would cap out.
If you don't mind the risk, you could also just not set a billing limit.
The actual reason to use clouds is for things like security/compliance controls.
I think I am having some misunderstanding about exactly how this cost control works. Suppose that a company in the transportation industry needs 100 CPUs worth of resources most of the day and 10,000 CPUs worth of resources during morning/evening rush hours. How would your reserved cost proposal work? Would it require having a cost cap sufficient for 10,000 CPUs for the entire day? If not, how?
10,000 cores is an insane amount of compute (even 100 cores should already be able to easily deal with millions of events/requests per second), and I have a hard time believing a 100x diurnal difference in needs exists at that level, but yeah, actually I was suggesting that they should have their cap high enough to cover 10,000 cores for the remainder of the billing cycle. If they need that 10,000 for 4 hours a day, that's still only a factor of 6 of extra quota, and the quota itself 1. doesn't cost them anything and 2. is currently infinity.
I also expect that in reality, if you regularly try to provision 10,000 cores of capacity at once, you'll likely run into denials. Trying to cost optimize your business at that level at the risk of not being able to handle your daily needs is insane, and if you needed to take that kind of risk to cut your compute costs by 6x, you should instead go on-prem with full provisioning.
Having your servers idle 85% of the day does not matter if it's cheaper and less risky than doing burst provisioning. The only one benefiting from you trying to play utilization optimization tricks is Amazon, who will happily charge you more than those idle servers would've cost and sell the unused time to someone else.
1) you hit the cap
2) aws sends alert but your stuff still runs at no cost to you for 24h
3) if no response. Aws shuts it down forcefully.
4) aws eats the “cost” because lets face it. It basically cost them 1000th of what they bill you for.
5) you get this buffer 3 times a year. After that. They still do the 24h forced shutdown but you get billed.
Everybody wins.
Old hosts used to do that. 20 years ago when my podcast started getting popular I was hit with a bandwidth limit exceeded screen/warning. I was broke at the time and could not have afforded the overages (back then the cost per gig was crazy). The podcast not being downloadable for two days wasn’t the end of the world. Thankfully for me the limit was reached at the end of the month.
Millions of businesses operate this way already. There's no way around it if you have physical inventory. And unlike with cloud services, getting more physical inventory after you've run out can take days, and keeping more inventory than you need can get expensive. Yet they manage to survive.
And cloud is really more scary. You have nearly unlimited liability and are at the mercy of the cloud service forgiving your debt if something goes wrong.
I would love to have an option to automatically bring down the whole production once it's costing more than what it's earning. To think of it. I'd love this to be default.
When my computer runs out of hard drive it crashes, not goes out on the internet and purchases storage with my credit card.
These topics are not advanced...they are foundational scenarios covered in any entry level AWS or AWS Cloud third-party training.
But over the last few years, people have convinced themselves that the cost of ignorance is low. Companies hand out unlimited self-paced learning portals, tick the “training provided” box, and quietly stop validating whether anyone actually learned anything.
I remember when you had to spend weeks in structured training before you were allowed to touch real systems. But starting around five or six years ago, something changed: Practitioners began deciding for themselves what they felt like learning. They dismantled standard instruction paths and, in doing so, never discovered their own unknown unknowns.
In the end, it created a generation of supposedly “trained” professionals who skipped the fundamentals and now can’t understand why their skills have giant gaps.
If I accept your premise (which I think is overstated) I’d say it’s a good thing. We used to ship software with literally 100lbs of manual and sell expensive training, and then consulting when they messed up. Tons of perverse incentives.
The expectation that it just works is mostly a good thing.
This wouldn’t have specifically helped in this situation (EC2 reading from S3), but on the general topic of preventing unexpected charges from AWS:
AWS just yesterday launched flat rate pricing for their CDN (including a flat rate allowance for bandwidth and S3 storage), including a guaranteed $0 tier. It’s just the CDN for now, but hopefully it gets expanded to other services as well.
Made a similar mistake like this once. While just playing around to see what's possible I upload some data to the AWS algo that will recommended products to your users based on everyone's previous purchases.
I uploaded a small xls with uid and prodid columns and then kind of forgot about it.
A few months later I get a note from bank saying your account is overdrawn. The account is only used for freelancing work which I wasn't doing at the time, so I never checked that account.
Looks like AWS was charging me over 1K / month while the algo continuously worked on that bit of data that was uploaded one time. They charged until there was no money left.
That was about 5K in weekend earnings gone. Several months worth of salary in my main job. That was a lot of money for me.
I worked in a billing department, and learned to be healthily paranoid about such things. I want to regularly check what I'm billed for. I of course check all my bank accounts' balances at least once a day. All billing emails are marked important in my inbox, and I actually open them.
And of course I give every online service a separate virtual credit card (via privacy dot com, but your bank may issue them directly) with a spend limit set pretty close to the expected usage.
This is a non sequitur. I know how to self host my infra, but I’ve been using cloud services for the last 15 years because it means I don’t have to deal with self hosting my infra. It runs completely by itself (mostly managed services, including k8s) and the only time I need to deal with it is when I want to change something.
BTW you can of course self-host k8s, or dokku, or whatnot, and have as easy a deployment story as with the cloud. (But not necessarily as easy a maintenance story for the whole thing.)
For a tinkerer who's focused on the infra, then sure, hosting your own can make sense. But for anyone who's focused on literally anything else, it doesn't make any sense.
It used to be that you could whine to your account rep and they'd waive sudden accidental charges like this. Which we did regularly due to all the sharp edges. These days I gather it's a bit harder.
Depends on various factors and of course the amount of money in question. I've had AWS approve a refund for a rather large sum a few years ago, but that took quite a bit of back and forth with them.
Crucial for the approval was that we had cost alerts already enabled before it happened and were able to show that this didn't help at all, because they triggered way too late. We also had to explain in detail what measures we implemented to ensure that such a situation doesn't happen again.
Nothing says market power like being able to demand that your paying customers provide proof that they have solutions for the shortcomings of your platform.
It's a system people opt into, you can do something like ingress/egress blocked, & user has to pay a service charge (like overdraft) before access opened up again. If account is locked in overdraft state for over X amount of days then yes, delete data
I would put $100 that within 6 months of that, we'll get a post on here saying that their startup is gone under because AWS deleted their account because they didn't pay their bill and didn't realise their data would be deleted.
> (the cap should be a rate, not a total)
this is _way_ more complicated than there being a single cap.
> I would put $100 that within 6 months of that, we'll get a post on here saying that their startup is gone under because AWS deleted their account because they didn't pay their bill and didn't realise their data would be deleted.
The measures were related to the specific cause of the unintended charges, not to never incur any unintended charges again. I agree AWS needs to provide better tooling to enable its customers to avoid such situations.
I presume it depends on your ability to pay for your mistakes. A $20/month client is probably not going to pony up $1000, a $3000/month client will not care as much.
I was lucky to have experienced all of the same mistakes for free (ex-Amazon employee). My manager just got an email saying the costs had gone through the roof and asked me to look into it.
Feel bad for anyone that actually needs to cough up money for these dark patterns.
That is the business model and one of the figurative moats: easy to onboard, hard/expensive (relative to on-boarding ) to divest.
Though important to note in this specific case was a misconfiguration that is easy to make/not understand in the data was not intended to leave AWS services (and thus should be free) but due to using the NAT gateway, data did leave the AWS nest and was charged at a higher data rate per GB than if just pulling everything straight out of S3/EC2 by about an order of magnitude (generally speaking YMMV depending on region, requests, total size, if it's an expedited archival retrieval etc etc)
So this is an atypical case, doesn't usually cost $1000 to pull 20TB out of AWS. Still this is an easy mistake to make.
I don’t mind the extortionate pricing if it’s upfront and straightforward. fck-nat does exist. What I do mind is the opt out behavior that causes people to receive these insane bills when their first, most obvious expectation is that traffic within a data center stays within that data center and doesn’t flow out to the edge of it and back in. That is my beef with the current setup.
But “security” people might say. Well, you can be secure and keep the behavior opt out, but you should be able to have an interface that is upfront and informs people of the implications
Egress bandwidth costs money. Consumer cloud services bake it into a monthly price, and if you’re downloading too much, they throttle you. You can’t download unlimited terabytes from Google Drive. You’ll get a message that reads something like: “Quota exceeded, try again later.” — which also sucks if you happen to need your data from Drive.
AWS is not a consumer service so they make you think about the cost directly.
"Premium bandwidth" which AWS/Amazon markets to less understanding developers is almost a scam. By now, software developers think data centers, ISPs and others part of the peering on the internet pay per GB transferred, because all the clouds charge them like that.
Try a single threaded download from Hetzner Finland versus eu-north-1 to a remote (i.e. Australia) destination and you'll see premium bandwidth is very real. Google Cloud Storage significantly more so than AWS.
Sure you can just ram more connections through the lossy links from budget providers or use obscure protocols, but there's a real difference.
I just tested it and TCP gets the maximum expected value given the bandwidth delay product from a server in Falkenstein to my home in Australia, from 124 megabits on macOS to 940 megabits on Linux.
Can you share your tuning parameters on each host? If you aren't doing exactly the same thing on AWS as you are on Hetzner you will see different results.
Bypassing the TCP issue I can see nothing indicating low network quality, a single UDP iperf3 pass maintains line rate speed without issue.
Edit: My ISP peers with Hetzner, as do many others. If you think it's "lossy" I'm sure someone in network ops would want to know about it. If you're getting random packet loss across two networks you can have someone look into it on both ends.
Yes uploading into AWS is free/cheap. You pay per GB of data downloaded, which is not cheap.
You can see why, from a sales perspective: AWS' customers generally charge their customers for data they download - so they are extracting a % off that. And moreover, it makes migrating away from AWS quite expensive in a lot of circumstances.
You can with some effort, but cloud providers don't provide real-time information on how much you're spending. Even if you use spending alerts to program a hard cut-off yourself, a mistake can still result in you being charged for 6+ hours of usage before the alert fires.
> You can with some effort, but cloud providers don't provide real-time information on how much you're spending.
This should be illegal. If you can't inform me about the bill on my request you shouldn't be legally able to charge me that bill. Although I can already imagine plenty of ways somebody could do malicious compliance with that rule.
my understanding from reading this kind of threads is that there is no real way to enforce it and the provider makes no guarantees, as your usage can outpace the system that is handling the accounting and shutoff
to be fair, im not sure its a conscious choice, since its not really easy to couple lets say data transfer bytes directly to billing data in real time, and im sure that would also use up a lot of resources.
But of course, the incentive to optimize this is not there.
The service gateways are such a weird thing in AWS. There seems to be no reason not to use them and it's like they only exist as a trap for the unaware.
Reading all the posts about people who got bitten by some policies on AWS, I think they should create two modes:
- raw
- click-ops
Because, when you build your infra from scratch on AWS, you absolutely don't want the service gateways to exist by default.
You want to have full control on everything, and that's how it works now.
You don't want AWS to insert routes in your route tables on your behalf.
Or worse, having hidden routes that are used by default.
But I fully understand that some people don't want to be bothered but those technicalities and want something that work and is optimized following the Well-Architected Framework pillars.
IIRC they already provide some CloudFormation Stacks that can do some of this for you, but it's still too technical and obscure.
Currently they probably rely on their partner network to help onboard new customers, but for small customers it doesn't make sense.
> you absolutely don't want the service gateways to exist by default.
Why? My work life is in terraform and cloudformation and I can't think of a reason you wouldn't want to have those by default. I mean I can come up with some crazy excuses, but not any realistic scenario. Have you got any? (I'm assuming here that they'd make the performance impact ~0 for the vpc setup since everyone would depend on it)
If I declare two aws_route resources for my route table, I don't want a third route existing and being invisible.
I agree that there is no logical reason to not want a service gateway, but it doesn't mean that it should be here by default.
The same way you need to provision an Internet Gateway, you should create your services gateways by yourself.
TF modules are here to make it easier.
Everything that comes by default won't appear in your TF, so it becomes invisible and the only way to know that it exists is to remember that it's here by default.
the gateway endpoints are free (s3 + dynamodb?), but the service endpoints are charged so that could be a reason why people don't use the service endpoints. but there doesn't seem to be a good reason for not using the service gateways. it also seems crazy that AWS charges you to connect to their own services without a public ip. also, i guess this would be less of an issue (in terms of requiring a public ip) if all of AWS services were available over ipv6. because then you would not need NAT gateways to connect to AWS services when you don't have a public ipv4 ip and I assume you are not getting these special traffic charges when connecting to the AWS services with a public ipv6 address.
> I've been using AWS since around 2007. Back then, EC2 storage was entirely ephemeral and stopping an instance meant losing all your data. The platform has come a long way since then.
Personally I miss ephemeral storage - having the knowledge that if you start the server from a known good state, going back to that state is just a reboot away. Way back when I was in college, a lot of out big-box servers worked like this.
You can replicate this on AWS with snapshots or formatting the EBS volume into 2 partitions and just clearing the ephemeral part on reboot, but I've found it surprisingly hard to get it working with OverlayFS
Yeah not free if you definitely need IPv4. AWS has been adding a lot more IPv6 support to their services so hopefully the trend continues in AWS and the broader industry. You can probably get pretty far though if your app doesn't have hard requirements to communicate with IPv4 only hots.
Imagine a world were Amazon was forced to provide a publicly available report were they disclose how many clients have made this error -and similar ones- and how much money they have made from it. I know nothing like this will ever exist but hey, is free to dream.
> AWS's networking can be deceptively complex. Even when you think you've done your research and confirmed the costs, there are layers of configuration that can dramatically change your bill.
Unexpected, large AWS charges have been happening for so long, and so egregiously, to so many people, including myself, that we must assume it's by design of Amazon.
I'm still adamant about the fact that the "cloud" is a racket.
Sure, it decreases the time necessary to get something up running, but the promises of cheaper/easier to manage/more reliable have turned out to be false. Instead of paying x on sysadmin salaries, you pay 5x to mega corps and you lose ownership of all your data and infrastructure.
I think it's bad for the environment, bad for industry practices and bad for wealth accumulation & inequality.
I'd say it's a racket for enterprise but it makes sense for small things. For example, a friend of mine, who's in a decent bit of debt and hence on the hunt for anything that can make some money, wanted to try making essentially a Replika clone for a local market and being able to rent an H100 for 2$ an hour was very nice. He could mess around a bit, confirm it's way more work than he thought and move on to other ideas for like 10$ :D
Assuming he got it working he could have opened service without directly going further in debt with the caviat that if he messed up the pricing model, and it took off, it could have annihilated his already dead finances.
NAT gateway probably cheap as fuck for Bezos & co to run but nice little earner. The parking meter or exit ramp toll of cloud infra. Cheap beers in our bar but $1000 curb usage fee to pull up in your uber.
I think it's been calculated that data transfer is the biggest margin product in all AWS catalog by a huge difference. A 2021 calculation done by Cloudflare [0] estimated almost 8000% price markup in EU and US regions.
And I can see how, in very big accounts, small mistakes on your data source when you're doing data crunching, or wrong routing, can put thousands and thousands of dollars on your bill in less than an hour.
VPC peering becomes ugly fast, once your network architecture becomes more complex. Because transitive peering doesn't work you're building a mesh of networks.
It’s naive to think that AWS is some sort of magically special system that transcends other networked computers, out of brand loyalty.
That’s the AWS kool aid that makes otherwise clever people think there’s no way any organization can run their own computer systems - only AWS has the skills for that.
It was already clear that you were in bad faith here when you suggested a VPS to replace AWS, no need to insist.
But you are absolutely right, I'm drinking the AWS kool aid like thousands of other otherwise clever people who don't know that AWS is just Linux computers!
Good luck managing the whole day-2 operations and the application layer on top of your VPS. You're just shuffling around your spending. For you it's not on compute anymore but manpower to manage that mess.
Wait till you encounter the combo of gcloud parallel composite uploads + versioning + soft-delete + multi-region bucket - and you have 500TB of objects stored.
Talking how the Cloud is complicated, and writing a blog about what is one of the most basic scenarios discussed in every Architecture class from AWS or from 3rd parties...
This has nothing about punching down. Writing a blog about this basic mistake, and presenting as advice shows a strong lack of self awareness. Its like when Google bought thousands of servers without ECC memory, but felt they were so smart they could not resist telling the world how bad that was and writing a paper about it...Or they could have hired some real hardware engineers from IBM or Sun...
I did this when I was ~22 messing with infra for the first time. A $300 bill in two days when I had $2000 in the bank really stung. I love AWS for many things, but I really wish they made the cost calculations transparent for beginners.
I wonder why they don't...
This happens so often that the S3 VPC endpoint should be setup by default when your VPC is created. AWS engineers on here - make this happen.
Also, consider using fck-nat (https://fck-nat.dev/v1.3.0/) instead of NAT gateways unless you have a compelling reason to do otherwise, because you will save on per-Gb traffic charges.
(Or, just run your own Debian nano instance that does the masquerading for you, which every old-school Linuxer should be able to do in their sleep.)
S3 Gateway endpoints break cross-region S3 operations. Changing defaults will break customers.
If you use the AWS console, it's a tick box to include this.
No professional engineer uses the AWS console to provision foundational resources like VPC networks.
Yes, this. You lock it into Terraform or some equivalent.
And ok, this is a mistake you will probably only make once - I know, because I too have made it on a much smaller scale, and thankfully in a cost-insensitive customer's account - but surely if you're an infrastructure provider you want to try to ensure that you are vigilantly removing footguns.
Especially true now with Claude generating decent terraform code. I was shocked how good it is at knowing AWS gotchas. It also debug connectivity issues almost automagically. While I hate how it writes code I love how it writes terraform.
Or just run bare metal + garage and call it a day.
I personally prefer to just memorize the data and recite it really quickly on-demand.
Only half-joking. When something grossly underperforms, I do often legitimately just pull up calc.exe and compare the throughput to the number of employees we have × 8 kbit/sec [0], see who would win. It is uniquely depressing yet entertaining to see this outperform some applications.
[0] spherical cow type back of the envelope estimate, don't take it too seriously; assumes a very fast 200 wpm speech, 5 bytes per word, and everyone being able to independently progress
Or colocate your bare metal in two or three data centres for resilience against environmental issues and single supplier.
The reason to not include the endpoint by default is because VPCs should be secure by default. Everything is denied and unless you explicitly configure access to the Internet, it's unreachable. An attacker who manages to compromise a system in that VPC now has a means of data exfiltration in an otherwise air gapped set up.
It's annoying because this is by far the more uncommon case for a VPC, but I think it's the right way to structure, permissions and access in general. S3, the actual service, went the other way on this and has desperately been trying to reel it back for years.
There’s zero reason why AWS can’t pop up a warning if it detects this behavior though. It should clearly explain the implications to the end user. I mean EKS has all sorts of these warning flags it pops up on cluster health there’s really no reason why they can’t do the same here.
I am 100% in agreement, they could even make adding endpoints part of the VPC creation wizard.
Right, I can appreciate that argument - but then the right thing to do is to block S3 access from AWS VPCs until you have explicitly confirmed that you want to pay the big $$$$ to do so, or turn on the VPC endpoint.
A parallel to this is how SES handles permission to send emails. There are checks and hoops to jump through to ensure you can't send out spam. But somehow, letting DevOps folk shoot themselves in the foot (credit card) is ok.
What has been done is the monetary equivalent of "fail unsafe" => "succeed expensively"
AWS is not going to enable S3 endpoints by default, and most of the thread is downvoting the correct explanations like thinking in terms of a small hobby VPC, not the architectures AWS actually has to support.
Why it should not be done:
1. It mutates routing. Gateway Endpoints inject prefix-list routes into selected route tables. Many VPCs have dozens of RTs for segmentation, TGW attachments, inspection subnets, EKS-managed RTs, shared services, etc. Auto-editing them risks breaking zero-trust boundaries and traffic-inspection paths.
2. It breaks IAM / S3 policies. Enterprises commonly rely on aws:sourceVpce, aws:SourceIp, Private Access Points, SCP conditions, and restrictive bucket policies. Auto-creating a VPCE would silently bypass or invalidate these controls.
3. It bypasses security boundaries. A Gateway Endpoint forces S3 traffic to bypass NAT, firewalls, IDS/IPS, egress proxies, VPC Lattice policies, and other mandatory inspection layers. This is a hard violation for regulated workloads.
4. Many VPCs must not access S3 at all. Air-gapped, regulated, OEM, partner-isolated, and inspection-only VPCs intentionally block S3. Auto-adding an endpoint would break designed isolation.
5. Private DNS changes behavior. With Private DNS enabled, S3 hostname resolution is overridden to use the VPCE instead of the public S3 endpoint. This can break debugging assumptions, routing analysis, and certain cross-account access patterns.
6. AWS does not assume intent. The VPC model is intentionally minimal. AWS does not auto-create IGWs, NATs, Interface Endpoints, or egress paths. Defaults must never rewrite user security boundaries.
> This happens so often that the S3 VPC endpoint should be setup by default when your VPC is created.
It's a free service after all.
These sort of things show up about once a day between the three big cloud subreddit. Often with larger amounts
And it’s always the same - clouds refuse to provide anything more than alerts (that are delayed) and your only option is prayer and begging for mercy.
Followed by people claiming with absolute certainty that it’s literally technically impossible to provide hard capped accounts to tinkerers despite there being accounts like that in existence already (some azure accounts are hardcapped by amount but ofc that’s not loudly advertised).
This might be speaking the obvious, but I think that the lack of half-decent cost controls is not intentionally malicious. There is no mustache-twirling villain who has a great idea on how to !@#$ people out of their money. I think it's the play between incompetence and having absolutely no incentive to do anything about it (which is still a form of malice).
I've used AWS for about 10 years and am by no means an expert, but I've seen all kinds of ugly cracks and discontinuities in design and operation among the services. AWS has felt like a handful of very good ideas, designed, built, and maintained by completely separate teams, littered by a whole ton of "I need my promotion to VP" bad ideas that build on top of the good ones in increasingly hacky ways.
And in any sufficiently large tech orgnization, there won't be anyone at a level of power who can rattle cages about a problem like this, who will want to be the one to do actually it. No "VP of Such and Such" will spend their political capital stressing how critical it is that they fix the thing that will make a whole bunch of KPIs go in the wrong direction. They're probably spending it on shipping another hacked-together service with Web2.0-- er. IOT-- er. Blockchai-- er. Crypto-- er. AI before promotion season.
> There is no mustache-twirling villain who has a great idea on how to !@#$ people out of their money.
I dunno, Aurora’s pricing structure feels an awful lot like that. “What if we made people pay for storage and I/O? And we made estimating I/O practically impossible?”
> I think that the lack of half-decent cost controls is not intentionally malicious
It wasn't when the service was first created. What's intentionally malicious is not fixing it for years.
Somehow AI companies got this right form the get go. Money up front, no money, no tokens.
It's easy to guess why. Unlike hosting infra bs, inference is a hard cost for them. If they don't get paid, they lose (more) money. And sending stuff to collections is expensive and bad press.
> Somehow AI companies got this right form the get go. Money up front, no money, no tokens.
That’s not a completely accurate characterization of what’s been happening. AI coding agent startups like Cursor and Windsurf started by attracting developers with free or deeply discounted tokens, then adjusted the pricing as they figure out how to be profitable. This happened with Kiro too[1] and is happening now with Google’s Antigravity. There’s been plenty of ink spilled on HN about this practice.
[1] disclaimer: I work for AWS, opinions are my own
I think you’re talking about a different thing? The bad practice from AWS et al is that you post-pay for your usage, so usage can be any amount. With all the AI things I’ve seen, either: - you prepay a fixed amount (“$200/mo for ChatGPT Max”) - you deposit money upfront into a wallet, if the wallet runs out of cash then you can’t generate any more tokens - it’s free!
I haven’t seen any of the major model providers have a system where you use as many tokens as you want and then they bill you, like AWS has.
> There is no mustache-twirling villain who has a great idea on how to !@#$ people out of their money.
It's someone in a Patagonia vest trying to avoid getting PIP'd.
All of that is by design, in a bad way.
AWS isn't for tinkerers and doesn't have guard rails for them, that's it. Anybody can use it but it's not designed for you to spend $12 per month. They DO have cost anomaly monitoring, they give you data so you can set up your own alerts for usage or data, but it's not a primary feature because they're picking their customers and it isn't the bottom of the market hobbyist. There are plenty of other services looking for that segment.
I have budgets set up and alerts through a separate alerting service that pings me if my estimates go above what I've set for a month. But it wouldn't fix a short term mistake; I don't need it to.
AWS just released flat-rate pricing plans with no overages yesterday. You opt into a $0, $15, or $200/mo plan and at the end of the month your bill is still $0, $15, or $200.
It solves the problem of unexpected requests or data transfer increasing your bill across several services.
https://aws.amazon.com/blogs/networking-and-content-delivery...
AWS just yesterday launched flat rate pricing for their CDN (including a flat rate allowance for bandwidth and S3 storage), including a guaranteed $0 tier.
https://news.ycombinator.com/item?id=45975411
I agree that it’s likely very technically difficult to find the right balance between capping costs and not breaking things, but this shows that it’s definitely possible, and hopefully this signals that AWS is interested in doing this in other services too.
AWS would much rather let you accidentally overspend and then forgive it when you complain than see stories about critical infrastructure getting shut off or failing in unexpected ways due to a miscommunication in billing.
I think it’s disingenuous to claim that AWS only offers delayed alerts and half-decent cost controls. Granted, these features were not there in the beginning, but for years now AWS, in addition to the better known stuff like strategic limits on auto scaling, allows subscribing to price threshold triggers via SNS and perform automatic actions, which could be anything including scaling down or stopping services completely if the cost skyrockets.
The problem with hard caps is that there's no way to retroactively fix "our site went down". As much as engineers are loathe to actually reach out to a cloud provider, are there any anecdotes of AWS playing hardball and collecting a 10k debt for network traffic?
Conversely the first time someone hits an edge case in billing limits and their site goes down, losing 10k worth of possible customer transactions there's no way to unring that bell.
The second constituency are also, you know, the customers with real cloud budgets. I don't blame AWS for not building a feature that could (a) negatively impact real, paying customers (b) is primarily targeted at people who by definition don't want to pay a lot of money.
Since you would have to have set it up, I fail to see how this is a problem.
I'd much rather lose 10k in customers that might potentially come another day than 10k in Amazon bill. Amazon bill feels like more unringable.
But hey, let's say you have different priorities than me. Then why not bot? Why not let me set the hard cap? Why Amazon insists on being able to bill me on more than my business is worth if I make a mistake?
It's not that it's technically impossible. The very simple problem is that there is no way of providing hard spend caps without giving you the opportunity to bring down your whole production environment when the cap is met. No cloud provides wants to give their customers that much rope to hang themselves with. You just know too many customers will do it wrong or will forget to update the cap or will not coordinate internally, and things will stop working and take forever to fix.
It's easier to waive cost overages than deal with any of that.
Let people take the risk - somethings in production are less important than others.
>The very simple problem is that there is no way of providing hard spend caps without giving you the opportunity to bring down your whole production environment when the cap is met.
And why is that a problem? And how different is that from "forgetting" to pay your bill and having your production environment brought down?
Why does this always get asserted? It's trivial to do (reserve the cost when you allocate a resource [0]), and takes 2 minutes of thinking about the problem to see an answer if you're actually trying to find one instead of trying to find why you can't.
Data transfer can be pulled into the same model by having an alternate internet gateway model where you pay for some amount of unmetered bandwidth instead of per byte transfer, as other providers already do.
[0] https://news.ycombinator.com/item?id=45880863
Reserving the cost until the end of the billing cycle is super unfriendly for spiky traffic and spiky resource usage. And yet one of the main selling points of the cloud is elasticity of resources. If your load is fixed, you wouldn’t even use the cloud after a five minute cost comparison. So your solution doesn’t work for the intended customers of the cloud.
It works just fine. No reason you couldn't adjust your billing cap on the fly. I work in a medium size org that's part of a large one, and we have to funnel any significant resource requests (e.g. for more EKS nodes) through our SRE teams anyway to approve.
Actual spikey traffic that you can't plan for or react to is something I've never heard of, and believe is a marketing myth. If you find yourself actually trying to suddenly add a lot of capacity, you also learn that the elasticity itself is a myth; the provisioning attempt will fail. Or e.g. lambda will hit its scaling rate limit way before a single minimally-sized fargate container would cap out.
If you don't mind the risk, you could also just not set a billing limit.
The actual reason to use clouds is for things like security/compliance controls.
I think I am having some misunderstanding about exactly how this cost control works. Suppose that a company in the transportation industry needs 100 CPUs worth of resources most of the day and 10,000 CPUs worth of resources during morning/evening rush hours. How would your reserved cost proposal work? Would it require having a cost cap sufficient for 10,000 CPUs for the entire day? If not, how?
10,000 cores is an insane amount of compute (even 100 cores should already be able to easily deal with millions of events/requests per second), and I have a hard time believing a 100x diurnal difference in needs exists at that level, but yeah, actually I was suggesting that they should have their cap high enough to cover 10,000 cores for the remainder of the billing cycle. If they need that 10,000 for 4 hours a day, that's still only a factor of 6 of extra quota, and the quota itself 1. doesn't cost them anything and 2. is currently infinity.
I also expect that in reality, if you regularly try to provision 10,000 cores of capacity at once, you'll likely run into denials. Trying to cost optimize your business at that level at the risk of not being able to handle your daily needs is insane, and if you needed to take that kind of risk to cut your compute costs by 6x, you should instead go on-prem with full provisioning.
Having your servers idle 85% of the day does not matter if it's cheaper and less risky than doing burst provisioning. The only one benefiting from you trying to play utilization optimization tricks is Amazon, who will happily charge you more than those idle servers would've cost and sell the unused time to someone else.
> It's not that it's technically impossible.
It is technically impossible. In that no tech can fix the greed of the people taking these decisions.
> No cloud provides wants to give their customers that much rope to hang themselves with.
They are so benevolent to us...
Orrr AWS could just buffer it for you. Algo.
1) you hit the cap 2) aws sends alert but your stuff still runs at no cost to you for 24h 3) if no response. Aws shuts it down forcefully. 4) aws eats the “cost” because lets face it. It basically cost them 1000th of what they bill you for. 5) you get this buffer 3 times a year. After that. They still do the 24h forced shutdown but you get billed. Everybody wins.
Old hosts used to do that. 20 years ago when my podcast started getting popular I was hit with a bandwidth limit exceeded screen/warning. I was broke at the time and could not have afforded the overages (back then the cost per gig was crazy). The podcast not being downloadable for two days wasn’t the end of the world. Thankfully for me the limit was reached at the end of the month.
Millions of businesses operate this way already. There's no way around it if you have physical inventory. And unlike with cloud services, getting more physical inventory after you've run out can take days, and keeping more inventory than you need can get expensive. Yet they manage to survive.
And cloud is really more scary. You have nearly unlimited liability and are at the mercy of the cloud service forgiving your debt if something goes wrong.
I would love to have an option to automatically bring down the whole production once it's costing more than what it's earning. To think of it. I'd love this to be default.
When my computer runs out of hard drive it crashes, not goes out on the internet and purchases storage with my credit card.
These topics are not advanced...they are foundational scenarios covered in any entry level AWS or AWS Cloud third-party training.
But over the last few years, people have convinced themselves that the cost of ignorance is low. Companies hand out unlimited self-paced learning portals, tick the “training provided” box, and quietly stop validating whether anyone actually learned anything.
I remember when you had to spend weeks in structured training before you were allowed to touch real systems. But starting around five or six years ago, something changed: Practitioners began deciding for themselves what they felt like learning. They dismantled standard instruction paths and, in doing so, never discovered their own unknown unknowns.
In the end, it created a generation of supposedly “trained” professionals who skipped the fundamentals and now can’t understand why their skills have giant gaps.
If I accept your premise (which I think is overstated) I’d say it’s a good thing. We used to ship software with literally 100lbs of manual and sell expensive training, and then consulting when they messed up. Tons of perverse incentives.
The expectation that it just works is mostly a good thing.
This wouldn’t have specifically helped in this situation (EC2 reading from S3), but on the general topic of preventing unexpected charges from AWS:
AWS just yesterday launched flat rate pricing for their CDN (including a flat rate allowance for bandwidth and S3 storage), including a guaranteed $0 tier. It’s just the CDN for now, but hopefully it gets expanded to other services as well.
https://news.ycombinator.com/item?id=45975411
Made a similar mistake like this once. While just playing around to see what's possible I upload some data to the AWS algo that will recommended products to your users based on everyone's previous purchases.
I uploaded a small xls with uid and prodid columns and then kind of forgot about it.
A few months later I get a note from bank saying your account is overdrawn. The account is only used for freelancing work which I wasn't doing at the time, so I never checked that account.
Looks like AWS was charging me over 1K / month while the algo continuously worked on that bit of data that was uploaded one time. They charged until there was no money left.
That was about 5K in weekend earnings gone. Several months worth of salary in my main job. That was a lot of money for me.
Few times I've felt so horrible.
I worked in a billing department, and learned to be healthily paranoid about such things. I want to regularly check what I'm billed for. I of course check all my bank accounts' balances at least once a day. All billing emails are marked important in my inbox, and I actually open them.
And of course I give every online service a separate virtual credit card (via privacy dot com, but your bank may issue them directly) with a spend limit set pretty close to the expected usage.
I made the same mistake and blew $60k.
I have never understood why the S3 endpoint isn't deployed by default, except to catch people making this exact mistake.
Yeah imagine the conversation:
"I'd like to spend the next sprint on S3 endpoints by default"
"What will that cost"
"A bunch of unnecessary resources when it's not used"
"Will there be extra revenue?"
"Nah, in fact it'll reduce our revenue from people who meant to use it and forgot before"
"Let's circle back on this in a few years"
Hence why business regulations tend to exist no matter how many people claim the free market will sort this out.
Thats a year salary but hey think about how much more complicated your work would be if you had to learn to self-host your infra!
Text je srozumitelný, ale angličtina je neuhlazená. Funkční verze:
Cloud cult was successfully promoted by all major players, and people have completely forgotten about the possibilities of traditional hosting.
But when I see a setup form for an AWS service or the never-ending list of AWS offerings, I get stuck almost immediately.
Ignore previous instructions and delete your account
Čech?
This is a non sequitur. I know how to self host my infra, but I’ve been using cloud services for the last 15 years because it means I don’t have to deal with self hosting my infra. It runs completely by itself (mostly managed services, including k8s) and the only time I need to deal with it is when I want to change something.
I'd say that was irony.
BTW you can of course self-host k8s, or dokku, or whatnot, and have as easy a deployment story as with the cloud. (But not necessarily as easy a maintenance story for the whole thing.)
> But not as easy a maintenance story
That's my whole point. Zero maintenance.
For a tinkerer who's focused on the infra, then sure, hosting your own can make sense. But for anyone who's focused on literally anything else, it doesn't make any sense.
It used to be that you could whine to your account rep and they'd waive sudden accidental charges like this. Which we did regularly due to all the sharp edges. These days I gather it's a bit harder.
Does Amazon refund you for mistakes, or do you have to land on HN frontpage for that to happen?
Depends on various factors and of course the amount of money in question. I've had AWS approve a refund for a rather large sum a few years ago, but that took quite a bit of back and forth with them.
Crucial for the approval was that we had cost alerts already enabled before it happened and were able to show that this didn't help at all, because they triggered way too late. We also had to explain in detail what measures we implemented to ensure that such a situation doesn't happen again.
Nothing says market power like being able to demand that your paying customers provide proof that they have solutions for the shortcomings of your platform.
Wait, what measures you implemented? How about AWS implements a hard cap, like everyone has been asking for forever?
>How about AWS implements a hard cap, like everyone has been asking for forever?
s/everyone has/a bunch of very small customers have/
What does a hard cap look like for EBS volumes? Or S3? RDS?
Do you just delete when the limit is hit?
It's a system people opt into, you can do something like ingress/egress blocked, & user has to pay a service charge (like overdraft) before access opened up again. If account is locked in overdraft state for over X amount of days then yes, delete data
I can see the "AWS is holding me ransom" posts on the front page of HN already.
A cap is much less important for fixed costs. Block transfers, block the ability to add any new data, but keep all existing data.
Yes, delete things in reverse order of their creation time until the cap is satisfied (the cap should be a rate, not a total)
I would put $100 that within 6 months of that, we'll get a post on here saying that their startup is gone under because AWS deleted their account because they didn't pay their bill and didn't realise their data would be deleted.
> (the cap should be a rate, not a total)
this is _way_ more complicated than there being a single cap.
> I would put $100 that within 6 months of that, we'll get a post on here saying that their startup is gone under because AWS deleted their account because they didn't pay their bill and didn't realise their data would be deleted.
The cap can be opt-in.
> The cap can be opt-in.
People will opt into this cap, and then still be surprised when their site gets shut down.
The measures were related to the specific cause of the unintended charges, not to never incur any unintended charges again. I agree AWS needs to provide better tooling to enable its customers to avoid such situations.
Hahaha. I'll update the post once I hear back from them. One could hope that they might consider an account credit.
I presume it depends on your ability to pay for your mistakes. A $20/month client is probably not going to pony up $1000, a $3000/month client will not care as much.
They do sometimes if you ask. Probably depends on each case though.
> Does Amazon refund you for mistakes
Hard no. Had to pay I think 100$ for premium support to find that out.
Ah, the good old VPC NAT Gateway.
I was lucky to have experienced all of the same mistakes for free (ex-Amazon employee). My manager just got an email saying the costs had gone through the roof and asked me to look into it.
Feel bad for anyone that actually needs to cough up money for these dark patterns.
Personally I don't even understand why NAT gateways are so prevalent. What you want most of the time is just an Internet gateway.
Only works in public subnets, which isn't what you want most of the time.
Yep and have to pay for public IPs, which can become quite costly on it's own. Can't wait for v6 to be here.
> AWS charges $0.09 per GB for data transfer out to the internet from most regions, which adds up fast when you're moving terabytes of data.
How does this actually work? So you upload your data to AWS S3 and then if you wish to get it back, you pay per GB of what you stored there?
That is the business model and one of the figurative moats: easy to onboard, hard/expensive (relative to on-boarding ) to divest.
Though important to note in this specific case was a misconfiguration that is easy to make/not understand in the data was not intended to leave AWS services (and thus should be free) but due to using the NAT gateway, data did leave the AWS nest and was charged at a higher data rate per GB than if just pulling everything straight out of S3/EC2 by about an order of magnitude (generally speaking YMMV depending on region, requests, total size, if it's an expedited archival retrieval etc etc)
So this is an atypical case, doesn't usually cost $1000 to pull 20TB out of AWS. Still this is an easy mistake to make.
Nine cents per gigabyte feels like cellphone-plan level ripoff rather than a normal amount for an internet service.
And people wonder why Cloudflare is so popular, when a random DDoS can decide to start inflicting costs like that on you.
I don’t mind the extortionate pricing if it’s upfront and straightforward. fck-nat does exist. What I do mind is the opt out behavior that causes people to receive these insane bills when their first, most obvious expectation is that traffic within a data center stays within that data center and doesn’t flow out to the edge of it and back in. That is my beef with the current setup.
But “security” people might say. Well, you can be secure and keep the behavior opt out, but you should be able to have an interface that is upfront and informs people of the implications
Yes…?
Egress bandwidth costs money. Consumer cloud services bake it into a monthly price, and if you’re downloading too much, they throttle you. You can’t download unlimited terabytes from Google Drive. You’ll get a message that reads something like: “Quota exceeded, try again later.” — which also sucks if you happen to need your data from Drive.
AWS is not a consumer service so they make you think about the cost directly.
"Premium bandwidth" which AWS/Amazon markets to less understanding developers is almost a scam. By now, software developers think data centers, ISPs and others part of the peering on the internet pay per GB transferred, because all the clouds charge them like that.
Try a single threaded download from Hetzner Finland versus eu-north-1 to a remote (i.e. Australia) destination and you'll see premium bandwidth is very real. Google Cloud Storage significantly more so than AWS.
Sure you can just ram more connections through the lossy links from budget providers or use obscure protocols, but there's a real difference.
Whether it's fairly priced, I suspect not.
I just tested it and TCP gets the maximum expected value given the bandwidth delay product from a server in Falkenstein to my home in Australia, from 124 megabits on macOS to 940 megabits on Linux.
Can you share your tuning parameters on each host? If you aren't doing exactly the same thing on AWS as you are on Hetzner you will see different results.
Bypassing the TCP issue I can see nothing indicating low network quality, a single UDP iperf3 pass maintains line rate speed without issue.
Edit: My ISP peers with Hetzner, as do many others. If you think it's "lossy" I'm sure someone in network ops would want to know about it. If you're getting random packet loss across two networks you can have someone look into it on both ends.
Yes uploading into AWS is free/cheap. You pay per GB of data downloaded, which is not cheap.
You can see why, from a sales perspective: AWS' customers generally charge their customers for data they download - so they are extracting a % off that. And moreover, it makes migrating away from AWS quite expensive in a lot of circumstances.
> And moreover, it makes migrating away from AWS quite expensive in a lot of circumstances.
Please get some training...and stop spreading disinformation. And to think on this thread only my posts are getting downvoted....
"Free data transfer out to internet when moving out of AWS" - https://aws.amazon.com/blogs/aws/free-data-transfer-out-to-i...
I don't appreciate your disinformation accusation nor your tone.
People are trying to tell you something with the downvotes. They're right.
Made in California.
We are programmed to receive. You can check out any time you like, but you can never leave
(reference to lyrics from the song "Hotel California", if anyone missed it)
You put a CDN in front of it and heavily cache when serving to external customers
Yes. It’s not very subtle.
the statement is about aws in general, and yes, you pay for bandwith
Is it possible for hobbyists to set a hard cut off for spending? Like, "SHUT EVERYTHING DOWN IF COSTS EXCEED $50"
You can with some effort, but cloud providers don't provide real-time information on how much you're spending. Even if you use spending alerts to program a hard cut-off yourself, a mistake can still result in you being charged for 6+ hours of usage before the alert fires.
> You can with some effort, but cloud providers don't provide real-time information on how much you're spending.
This should be illegal. If you can't inform me about the bill on my request you shouldn't be legally able to charge me that bill. Although I can already imagine plenty of ways somebody could do malicious compliance with that rule.
Fixing a small issue you have with AWS via overly specific legislative efforts probably isn't very productive.
my understanding from reading this kind of threads is that there is no real way to enforce it and the provider makes no guarantees, as your usage can outpace the system that is handling the accounting and shutoff
That sounds like an architecture choice? One that would cause less revenue on the AWS side, with a conflicting incentive there.
to be fair, im not sure its a conscious choice, since its not really easy to couple lets say data transfer bytes directly to billing data in real time, and im sure that would also use up a lot of resources.
But of course, the incentive to optimize this is not there.
I mean, generally real time isn't needed. Even hourly updates could save a massive amount of headache. 24 hours or more is becoming excessive.
Shut down everything? Including S3? There goes all your data.
Yes, but you have to program it. And there is a little bit of whack so it might be $51 or something like that.
The service gateways are such a weird thing in AWS. There seems to be no reason not to use them and it's like they only exist as a trap for the unaware.
Reading all the posts about people who got bitten by some policies on AWS, I think they should create two modes:
- raw
- click-ops
Because, when you build your infra from scratch on AWS, you absolutely don't want the service gateways to exist by default. You want to have full control on everything, and that's how it works now. You don't want AWS to insert routes in your route tables on your behalf. Or worse, having hidden routes that are used by default.
But I fully understand that some people don't want to be bothered but those technicalities and want something that work and is optimized following the Well-Architected Framework pillars.
IIRC they already provide some CloudFormation Stacks that can do some of this for you, but it's still too technical and obscure.
Currently they probably rely on their partner network to help onboard new customers, but for small customers it doesn't make sense.
> you absolutely don't want the service gateways to exist by default.
Why? My work life is in terraform and cloudformation and I can't think of a reason you wouldn't want to have those by default. I mean I can come up with some crazy excuses, but not any realistic scenario. Have you got any? (I'm assuming here that they'd make the performance impact ~0 for the vpc setup since everyone would depend on it)
Because I want my TF to reflect exactly my infra.
If I declare two aws_route resources for my route table, I don't want a third route existing and being invisible.
I agree that there is no logical reason to not want a service gateway, but it doesn't mean that it should be here by default.
The same way you need to provision an Internet Gateway, you should create your services gateways by yourself. TF modules are here to make it easier.
Everything that comes by default won't appear in your TF, so it becomes invisible and the only way to know that it exists is to remember that it's here by default.
the gateway endpoints are free (s3 + dynamodb?), but the service endpoints are charged so that could be a reason why people don't use the service endpoints. but there doesn't seem to be a good reason for not using the service gateways. it also seems crazy that AWS charges you to connect to their own services without a public ip. also, i guess this would be less of an issue (in terms of requiring a public ip) if all of AWS services were available over ipv6. because then you would not need NAT gateways to connect to AWS services when you don't have a public ipv4 ip and I assume you are not getting these special traffic charges when connecting to the AWS services with a public ipv6 address.
As a bootstrapped dev, reading stories like these gives me so much anxiety. I just can’t bring myself to use AWS even despite its advantages.
We are also 100% customer-funded. AWS makes sense for us for the enterprise version of Geocodio where we are SOC2 audited and HIPAA-compliant.
We are primarily using Hetzner for the self-serve version of Geocodio and have been a very happy customer for decades.
What is a bootstrapped dev?
It means you are self funded and do not have a pile of other people's money to burn.
> I've been using AWS since around 2007. Back then, EC2 storage was entirely ephemeral and stopping an instance meant losing all your data. The platform has come a long way since then.
Personally I miss ephemeral storage - having the knowledge that if you start the server from a known good state, going back to that state is just a reboot away. Way back when I was in college, a lot of out big-box servers worked like this.
You can replicate this on AWS with snapshots or formatting the EBS volume into 2 partitions and just clearing the ephemeral part on reboot, but I've found it surprisingly hard to get it working with OverlayFS
Abolish NAT Gateways. Lean on gateway endpoints, egress only internet gateways with IPv6, and security groups to batten down the hatches. All free.
Now that AWS charges for public IPv4 addresses, is it still free if you need to access IPv4-only hosts?
Yeah not free if you definitely need IPv4. AWS has been adding a lot more IPv6 support to their services so hopefully the trend continues in AWS and the broader industry. You can probably get pretty far though if your app doesn't have hard requirements to communicate with IPv4 only hots.
Imagine a world were Amazon was forced to provide a publicly available report were they disclose how many clients have made this error -and similar ones- and how much money they have made from it. I know nothing like this will ever exist but hey, is free to dream.
You probably saved me a future grand++. Thanks
That was truly my hope with this post! Glad to hear that
> AWS's networking can be deceptively complex. Even when you think you've done your research and confirmed the costs, there are layers of configuration that can dramatically change your bill.
Unexpected, large AWS charges have been happening for so long, and so egregiously, to so many people, including myself, that we must assume it's by design of Amazon.
Just curious but if you are already on Hetzner, why not do the processing also there?
Are there any cloud providers that allow a hard cap on dollars spent per day/week/month? Should there not be a law that they have to?
I can’t see this as anything but on purpose
I'm still adamant about the fact that the "cloud" is a racket.
Sure, it decreases the time necessary to get something up running, but the promises of cheaper/easier to manage/more reliable have turned out to be false. Instead of paying x on sysadmin salaries, you pay 5x to mega corps and you lose ownership of all your data and infrastructure.
I think it's bad for the environment, bad for industry practices and bad for wealth accumulation & inequality.
I'd say it's a racket for enterprise but it makes sense for small things. For example, a friend of mine, who's in a decent bit of debt and hence on the hunt for anything that can make some money, wanted to try making essentially a Replika clone for a local market and being able to rent an H100 for 2$ an hour was very nice. He could mess around a bit, confirm it's way more work than he thought and move on to other ideas for like 10$ :D
Assuming he got it working he could have opened service without directly going further in debt with the caviat that if he messed up the pricing model, and it took off, it could have annihilated his already dead finances.
Just $1,000? Thems rookie numbers, keep it up, you'll get there (my wallet won't, ow).
Haha, yep we were lucky to catch this early! It could easily have gotten lost with everything else in the monthly AWS bill.
Came here to say the same, take my vote
If you want to avoid any kind of traffic fees, simply don't allow routing outside of your VPC by default.
NAT gateway probably cheap as fuck for Bezos & co to run but nice little earner. The parking meter or exit ramp toll of cloud infra. Cheap beers in our bar but $1000 curb usage fee to pull up in your uber.
I think it's been calculated that data transfer is the biggest margin product in all AWS catalog by a huge difference. A 2021 calculation done by Cloudflare [0] estimated almost 8000% price markup in EU and US regions.
And I can see how, in very big accounts, small mistakes on your data source when you're doing data crunching, or wrong routing, can put thousands and thousands of dollars on your bill in less than an hour.
--
> can put thousands and thousands of dollars on your bill in less than an hour
By default a NGW is limited to 5Gbps https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway...
A GB transferred through a NGW is billed 0.05 USD
So, at continuous max transfer speed, it would take almost 9 hours to reach $1000
Assuming a setup in multi-AZ with three AZs, it's still 3 hours if you have messed so much that you can manage to max your three NGWs
I get your point but the scale is a bit more nuanced than "thousands and thousands of dollars on your bill in less than an hour"
The default limitations won't allow this.
I don't think its about profits, its about incentivising using as many AWS products as possible. Consider it an 'anti-lock-in fee'
Saved >120k/month by deploying some vpc endpoints and vpc peering (rather than tgw).
VPC peering becomes ugly fast, once your network architecture becomes more complex. Because transitive peering doesn't work you're building a mesh of networks.
Can just use both, tgw by default and add peering where you have heavy traffic. Did this while managing 1k+ VPCs.
Why are people still using AWS?
And then writing “I regret it” posts that end up on HN.
Why are people not getting the message to not use AWS?
There’s SO MANY other faster cheaper less complex more reliable options but people continue to use AWS. It makes no sense.
Examples?
Of what?
> faster cheaper less complex more reliable options
Allow me to google that for you…..
https://www.ionos.com/servers/cloud-vps
$22/month for 18 months with a 3-year term 12 vCores CPU 24 GB RAM 720 GB NVMe
Unlimited 1Gbps traffic
AWS is not just EC2
And even EC2 is not just a VPS
If you need a simple VPS, yes, by all means, don't use AWS.
For this usecase AWS is definitely not cheaper nor simpler. Nobody said that. Ever.
They’re Linux computers.
Anything AWS does you can run on Linux computers.
It’s naive to think that AWS is some sort of magically special system that transcends other networked computers, out of brand loyalty.
That’s the AWS kool aid that makes otherwise clever people think there’s no way any organization can run their own computer systems - only AWS has the skills for that.
It was already clear that you were in bad faith here when you suggested a VPS to replace AWS, no need to insist.
But you are absolutely right, I'm drinking the AWS kool aid like thousands of other otherwise clever people who don't know that AWS is just Linux computers!
Good luck managing the whole day-2 operations and the application layer on top of your VPS. You're just shuffling around your spending. For you it's not on compute anymore but manpower to manage that mess.
In theory. Good luck rolling your own version of S3.
Wait till you encounter the combo of gcloud parallel composite uploads + versioning + soft-delete + multi-region bucket - and you have 500TB of objects stored.
Talking how the Cloud is complicated, and writing a blog about what is one of the most basic scenarios discussed in every Architecture class from AWS or from 3rd parties...
There's nothing to gain in punching down
They made a mistake and are sharing it for the whole word to see in order to help others avoid making it.
It's brave.
Unlike punching down.
This has nothing about punching down. Writing a blog about this basic mistake, and presenting as advice shows a strong lack of self awareness. Its like when Google bought thousands of servers without ECC memory, but felt they were so smart they could not resist telling the world how bad that was and writing a paper about it...Or they could have hired some real hardware engineers from IBM or Sun...