“Everything You Expect from a Cloud, Running on Your Terms“*
Except you own ops, management, extension, interoperability, access, security, scalability, redundancy… words cannot express how ridiculous all of the koober propaganda is
The way I've described this for years: Kubernetes makes managing 1000 servers as easy as managing 20 servers, and makes managing 3 servers as easy as managing 20 servers.
My way of thinking about it is this: you have your own hyper-flexible Heroku, but (monkey's paw curls) you can only interact with it by typing large amounts of YAML.
Oh, and all the documentation for that YAML assumes you've memorized as much vocabulary as a Foreign Language 101 class.
(And there is a mad god that says: if you try to use click-ops to get around this without knowing the vocabulary, you're going to have a bad time.)
But on the other hand: to put it in terms of the "3 servers" - the moment you think you'll have 3 servers, and any level of uptime expectations, you'll inevitably have to rebuild them, services and logging and all, from scratch often and quickly enough that you might as well have 20 servers with how stressful that rebuild will be.
k8s can be a saving grace there, and I recommend it to anyone with the time and interest in how cluster best practices work! But it's not a free option or a weekend skill-up.
And if you ever outgrow that it's going to be a huge pain. Or a hardware failure. If you start on Kubernetes early you'll be able to add more servers for capacity very easily. Not to mention out of the box you get failover and HA. And you can deploy managed services and have database deployments, or object stores, etc.
But 99% of the time you don't outgrow it and don't have SLA's requiring them to have failover or HA. This is why so many sites can get away with using PostgreSQL with it's complete lack of good/native HA.
HA and failover, to me-in these small deploys, is more about hardware maintenance rather than maintaining SLAs. Being able to shutdown the computer hosting your containers to scale vertically.
You could always run a VM / VPS against a managed DB. Many small cloud / VPS providers, like Digital Ocean or Vultr, also offer managed DB services that are cheaper than AWS RDS.
All things people used to own 10 years ago. It’s not like the people doing that stuff have vanished.
Cloud’s big promise was speed to market and price, and let’s be honest, price is no longer there compared to a decent operation.
The one thing where clouds remain kings is speed for small teams. Any large enough company should probably Ask themselves whether running their own operation using ias would be a better choice.
My company is on prem, spending north of 1 billion per year. Cloud is actually cheaper when considering total cost of ownership. Thats salaries, opex, capex costs. Worse, our speed to delivery is generally worse.
Because on prem is inelastic, we are at sub 10% peak utilization of compute resources. If we added in the likely higher cloud utilization rate we are talking of 30%+ savings from on prem.
That's not unusual. First off, sometimes 1000 extra dollars will get you a ton more compute you need so why not and second, on prem tends to be extremely inelastic so you buy a ton of compute because you never know when compute requirements change.
If we're talking on the scale of $1,000s then it's cheaper to run on-prem than in the cloud. It's really easy to spend $1,000 on managed kubernetes and have very little actual compute.
Peak Utilization is a tough one for on prem and is a decent argument for cloud. I was at a company that was also at <10% peak utilization most of the time. It was finance, so it was mostly doing nothing, except for the couple days a year where we shot up 10000x, so we had to build for that case. So yeah losing the data centers, and cloud was a cost savings.
So the "obvious" (but complex!) solution is a "hybrid cloud": use on-prem for the predictable, constant "DC component" of your demand, and use cloud for cyclical or unpredictable demand changes. That will keep peak demand decoupled from permanently provisioned capacity while saving on always-utilized capacity. Easier said than done, of course.
The route we ended up going, was hybrid cloud with a colocation for on-prem (Mainframes and a few servers), and then AWS for the cloud portion. Not sure on what the cost savings were. As I wrote the authentication service I knew what those numbers were, so I knew that on a normal day we would be sitting at 1% cpu usage basically all day except market open/close. And then fiscal quarter ends were a big bump. And then of course big news days. 99% of the time for 99% of the days were just nothing.
You have to pick your battles. Most of this stuff isn't necessary to babysit until you're scaling your app tremendously. And by the time you're doing that I'm sure you've got the people to do these things.
And then why would you need koob at all? All that setup and learning on a platform you don't understand and won't need to manage and you will do it wrong, so a completely wasted set of time and money afaict.
It's crazy to me how many people deploy unmaintainable spaghetti mess in all other environments I've been in. "koober" environments are the most organized.
And particularly the upgrades every 3 months. Not just your nodes and masters, but every operator you use, and your manifests each time they deprecate a manifest beta version.
Speaking as someone who is deploying HA Kubernetes clusters on bare metal just for fun in just a few seconds I noticed something which made me stop reading this article all together.
Just deploy Rook and Ceph? ARE YOU BLEEPING KIDDING ME?!?
There's a job description called "Storage Engineer". These people know a little bit about Kubernetes, but are mostly specialized in everything Ceph. This tells you everything how hard it is to keep Ceph humming along in production. As a sidenote: if you want to make really good money there's also somebody called a "Ceph consultant" who is called in when SHTF. And if SHTF in a Ceph cluster, it really does.
And that's besides all the crap it takes to get and keep Kubernetes running smoothly: Kernel Optimization. Networking. Security. Storage integration. Observability. And the list goes on...
In other words, unless you are VERY well versed in a variety of topics ranging from server architecture to deep Linux knowledge and are knee deep in the usual day to day operations stuff already you are better off running Kubernetes in the cloud and leaving all the intricacies to the likes of Google, Microsoft and Amazon than trying to run a well designed cluster architecture yourself. It just isn't worth it.
Documentation is out there and readily available. I have k8's in my homelab, a server rack of some modern-ish poweredges, fail over hypervisors, ansible books, etc. Just a single guy, not a team or anything. It's really up to the reader to go do it.
> This autonomy is a superpower for small teams. We detailed the financial side of this journey in How moving from AWS to Bare-Metal saved us $230,000 /yr. The cultural unlock has been even bigger.
This doesn't seem to be aimed at homelab but small teams.
Been using Nomad and different sorts of K8S for the last 6+ years at home/work. Nomad is easier to bootstrap, lighter on resources and so so so much easier wrap your head around. Just Nomad + NFS server, from my perspective, is a perfect start for a homelab/small project. You can add complexity to it as you go. It is a real joy work with once you after a day of tinkering. Want to run on windows? Sure. BSD, there is a driver for it. Don't want Docker? There is Podman driver. OCI sucks? Just run binaries without isolation. Need VM's? You can switch from Proxmox to purely Nomad setup with Qemu driver with a bit of sweat. Illumos zones on OmniOS? weird, but with quite a bit of time, but there was a repo on github, just need to build the binary with the patch.
And while k8s can do all the same things and much more with a bit of trying, but it requires a mission control the second you add a second developer, you will have built-in primitives that will compete all the time with the ones you bolt-on, etc etc. Nomad feels much more opinionated and in a good way.
Nomad is one of those things that gets you 90% of the way with 20% of effort, and only then if you need something, you can add things to it. K8S is great, way more flexible, there are managed options out there, massive ecosystem, but it always feels like out of the box you need to glue 5 different tools to it, just get it going.
Also Incus. Stephane Graeber is doing lords work by sticking to his thing. That's also super fun to mess with.
In my experience Nomad is a much better system. People avoid it because they fear vendor lock-in, but the fact that Hashicorp controls it also means that it is well-designed and interoperates easily with other Hashicorp tools.
I feel like articles like this need to come with a diagram like this to put it in the context of relevant tradeoffs.
High Scale/Revenue
│
│
Managed Services │ Self-hosted K8s
(Overpaying but │ (article is
no choice) │ pitched here)
│
────────────────────────────┼────────────────────────────
Low capacity │ High capacity
│
│
Managed Services │ Managed Services
(Right choice - │ (Wasting money on
focus on product) │ platform team)
│
Low Scale/Revenue
Or something like that. Maybe as a function of time as well but that might be harder to do in 2d.
Sure I can absolutely manage my own k8s, but there is no doubt it's easier for me to spin up postgres and ship faster on my own. At enterprise scale it's definitely a lot easier to do everything in k8s and be able to manage as many aspects as possible. I have experience of both.
I've used Ceph together with Proxmox VE excessively. No problems whatsoever.
And in related news, Proxmox VE is often probably a more sensible thing to use for a private cloud environment, because it is far more flexible and easier to use than Kubernetes.
The root cause here is just that managing any kind of storage service is instantly painful. The property of "not losing data" means that you are sort of required to always be doing something in order to keep it healthy.
Kubernetes is a rat nest and I long hope for Kubernetes be simpler (Who needs this Gateway API?) but Devs keep building crazy and crazy solutions so we have to pivot to keep up.
Kubernetes is powerful, yes. it is also a feckless rats nest of bolt-ons and ride-alongs. its sharepoint levels of byzantine tuning so complex that, like sharepoint, it comes with its own bespoke administrators that often have little or no knowledge of basic networking or operating systems --only kubernetes--.
- Upgrading a kubernetes cluster may as well be an olympic sport. its so draconian most best practice documentation insists you build a second cluster for AB deployment.
- load balancers come in half a dozen flavours, with the default options bolted at the hip to the cloud cartel. MetalLB is an option, but your admin doesnt understand subnets let alone BGP.
- It is infested with the cult of immutability. pod not working? destroy it. network traffic acting up? destroy the node. container not working? time to destroy it. cluster down? rebuilt the entire thing. At no point does the "devops practitioner" stop to consider why or how a thing of kubernetes has betrayed them. it is assumed you have a football field of fresh bare metal to reinitialize everything onto at a moments notice, failure modes be damned.
what your company likely needs is some implementation of libvirtd or proxmox. run your workloads on rootless podman or (god forbid) deploy to a single VM.
> what your company likely needs is some implementation of libvirtd or proxmox. run your workloads on rootless podman or (god forbid) deploy to a single VM.
Even with a single VM, someone's company probably will also want a reverse proxy and certificate management (assuming web services), automated deployments, provide secrets to services, storage volumes, health checks with auto restarts, ability to wire logs and metrics to some type of monitoring system, etc. All of this is possible with scripts and config management tools but now complexity is being managed in different ways. Alternatively use K3s and Flux to end up with a solution that checks all of those boxes while also having the option to use k8s manifests in public clouds.
I preface this with saying that my experience is all low/medium traffic and single cluster, and I've never had to develop for Kubernetes. But as a sysadmin, I don't mind it at all. I started a new job learned through being given logins to for AWS EKS mobile/web application backend and an on-prem OpenShift for an internal application. The the whole thing mostly works logically once you understand the underlying concepts, and the documentation is pretty good. The only issues I ever had that required external assistance were platform specific quirks (like EKS ALB annotations most recently). Even moved several of our single server workloads over.
I dont have any of this experience. I only have to change the version number and the upgrades roll themselves out.
MetalLB is good yes, and admins should have IP knowledge. I ask this in interview questions.
Yes, sheep not pets is the term here. Self healing is wonderful. There's plenty to dig into if you run into the same problem repeatedly. Being able to yank a node out that's misbehaving is very nice from a maintenance pov.
Talos on bare metal to get kubernetes features is pretty good. That's what my homelab is. I hated managing VMs before that.
I'm not really clear on the complaint. Is it immutability or not? I'm not saying delete the cluster and start over, I'm saying i can yank a node or destroy a container without (much of) a consequence. Talos is immutable similarly to nix afaik
There's only small parts of the typically used parts of the kubernetes api that are immutable and those have good reasons. So I'm still not really sure what issue you're describing.
I actually do agree. I've setup MetalLB in my home lab and it was super simple. I've doing networking since the 90's though, both professionally and as a hobby where I operate my own AS for fun, so I can see how someone else could be intimidated.
As someone who self hosted bare metal Kubernetes on my own rack, it's a lot of work to get it set up. We used RedHat Openshift which has a pretty good solution out of the box, but the learning curve was relatively high.
That being said, once it was set up, there was not a lot of maintenance. Kubernetes is quire resilient when set up properly, and the cost savings were significant.
Try talos next time. It took minutes to setup. Red Hat docs and product names scare me since they are intentionally obtuse. I thought I wanted openshift but no way i'm paying and i couldn't figure out how to even get started. Talos was such a breeze.
This is not a good reason to have done it. To me this means that the expectations and outcomes were flawed as they are solving a problem that shouldn't have existed. I can't really agree with the sentiment or overview of this post
great post but maybe missing some things, like configuring load balancers with metallb, CNI's like Calico, how you can get your own address space from ARIN for IPv6, and API gateways
There is a fundamental misunderstanding people have about technology that you only get from operating and maintaining it. There's what something can do, and then there's actually doing it.
K8s is complicated as hell to learn to use. Its learning curve is very shallow. Yes, you can get a "hello world" running quickly, but that is not the benchmark for whether you actually understand what's going on, or how to make it do what you need.
But once you do learn it thoroughly, it's ridiculously fast to ramp up on it and use it for very complex things, very quickly. As a developer medium, as an operational medium, it accelerates the kind of modern practices (that for some reason most people still don't know about?) that can produce a lot of value.
But that's if someone else is building and maintaining the underlying "guts" of it for you. If you try to build it from scratch and maintain it on bare metal, suddenly it's incredibly complicated again with another shallow learning curve and a lot of effort. And that effort continues, because they keep changing the fucking thing every 6-12 months...
It's like learning to drive a car or ride a bike. At first it's difficult and time-consuming. But then you get it, and you can ride around pretty easily, and it can take you far. However, that does not mean you understand car/bike mechanics, or can take it apart and rebuild it when something breaks. So be prepared to pay someone for that maintenance, or be prepared to become a mechanic (and do that work while also doing your day job). This analogy is stretched thinner by the fact that nobody's constantly changing how your car works...
No, what I need is NixOS, a configuration with a language that's a bit hard to digest but effective, which I can build and read, allowing me to replicate my infrastructure, create custom ISOs etc, in an almost totally automatic and manageable way on domestic hardware at a domestic scale.
What's needed isn't a rambling YAML and immense resource consumption; it's IaC, built-in to the system, that can do what's necessary not for an IT giant that lives off others' services run in-house, but for me, a private citizen with just a few of my own services, little time to manage them, a need to quickly replicate the infrastructure because I don't have infinite data centers, so if the homeserver dies, I need to buy another cheap desktop and set it up, restoring it on the fly. So if I'm offline for a few hours, nothing happens, but hardware costs money, so I need to use it well, and so on.
Is this a joke? It's like the kind of article about kubernetes that we would have seen 10 years ago. Especially some of the ridiculous claims like you can run every service on your local k8s that you could run in the cloud. No, building managed service equivalents to run locally is not trivial.
One perpetuates the other. If you have k8s, you are tempted to use microservice architecture and if you have microservices then k8s makes them look manageable.
I suspect both of them will go down together if/when they do.
“Everything You Expect from a Cloud, Running on Your Terms“*
Except you own ops, management, extension, interoperability, access, security, scalability, redundancy… words cannot express how ridiculous all of the koober propaganda is
The way I've described this for years: Kubernetes makes managing 1000 servers as easy as managing 20 servers, and makes managing 3 servers as easy as managing 20 servers.
My way of thinking about it is this: you have your own hyper-flexible Heroku, but (monkey's paw curls) you can only interact with it by typing large amounts of YAML.
Oh, and all the documentation for that YAML assumes you've memorized as much vocabulary as a Foreign Language 101 class.
(And there is a mad god that says: if you try to use click-ops to get around this without knowing the vocabulary, you're going to have a bad time.)
But on the other hand: to put it in terms of the "3 servers" - the moment you think you'll have 3 servers, and any level of uptime expectations, you'll inevitably have to rebuild them, services and logging and all, from scratch often and quickly enough that you might as well have 20 servers with how stressful that rebuild will be.
k8s can be a saving grace there, and I recommend it to anyone with the time and interest in how cluster best practices work! But it's not a free option or a weekend skill-up.
Most products barely need 1 server.
And if you ever outgrow that it's going to be a huge pain. Or a hardware failure. If you start on Kubernetes early you'll be able to add more servers for capacity very easily. Not to mention out of the box you get failover and HA. And you can deploy managed services and have database deployments, or object stores, etc.
But 99% of the time you don't outgrow it and don't have SLA's requiring them to have failover or HA. This is why so many sites can get away with using PostgreSQL with it's complete lack of good/native HA.
HA and failover, to me-in these small deploys, is more about hardware maintenance rather than maintaining SLAs. Being able to shutdown the computer hosting your containers to scale vertically.
You could always run a VM / VPS against a managed DB. Many small cloud / VPS providers, like Digital Ocean or Vultr, also offer managed DB services that are cheaper than AWS RDS.
All things people used to own 10 years ago. It’s not like the people doing that stuff have vanished.
Cloud’s big promise was speed to market and price, and let’s be honest, price is no longer there compared to a decent operation.
The one thing where clouds remain kings is speed for small teams. Any large enough company should probably Ask themselves whether running their own operation using ias would be a better choice.
My company is on prem, spending north of 1 billion per year. Cloud is actually cheaper when considering total cost of ownership. Thats salaries, opex, capex costs. Worse, our speed to delivery is generally worse.
Because on prem is inelastic, we are at sub 10% peak utilization of compute resources. If we added in the likely higher cloud utilization rate we are talking of 30%+ savings from on prem.
> we are at sub 10% peak utilization of compute resources
so... you bought way too much hardware?
That's not unusual. First off, sometimes 1000 extra dollars will get you a ton more compute you need so why not and second, on prem tends to be extremely inelastic so you buy a ton of compute because you never know when compute requirements change.
If we're talking on the scale of $1,000s then it's cheaper to run on-prem than in the cloud. It's really easy to spend $1,000 on managed kubernetes and have very little actual compute.
Peak Utilization is a tough one for on prem and is a decent argument for cloud. I was at a company that was also at <10% peak utilization most of the time. It was finance, so it was mostly doing nothing, except for the couple days a year where we shot up 10000x, so we had to build for that case. So yeah losing the data centers, and cloud was a cost savings.
So the "obvious" (but complex!) solution is a "hybrid cloud": use on-prem for the predictable, constant "DC component" of your demand, and use cloud for cyclical or unpredictable demand changes. That will keep peak demand decoupled from permanently provisioned capacity while saving on always-utilized capacity. Easier said than done, of course.
The route we ended up going, was hybrid cloud with a colocation for on-prem (Mainframes and a few servers), and then AWS for the cloud portion. Not sure on what the cost savings were. As I wrote the authentication service I knew what those numbers were, so I knew that on a normal day we would be sitting at 1% cpu usage basically all day except market open/close. And then fiscal quarter ends were a big bump. And then of course big news days. 99% of the time for 99% of the days were just nothing.
K8s helps reduce that complexity a lot.
My comparison was cloud vs renting metal, not buying your own machines.
You have to pick your battles. Most of this stuff isn't necessary to babysit until you're scaling your app tremendously. And by the time you're doing that I'm sure you've got the people to do these things.
And then why would you need koob at all? All that setup and learning on a platform you don't understand and won't need to manage and you will do it wrong, so a completely wasted set of time and money afaict.
IMO, there's no alternative private cloud.
Exactly. https://www.reddit.com/r/kubernetes/comments/u9b95u/kubernet...
It's crazy to me how many people deploy unmaintainable spaghetti mess in all other environments I've been in. "koober" environments are the most organized.
And particularly the upgrades every 3 months. Not just your nodes and masters, but every operator you use, and your manifests each time they deprecate a manifest beta version.
Ive found nomad to be a much simpler replacement for smaller scale deployments.
It’s a well known thing that if you run on ec2 they handle all those things for you (especially the security part)
I wonder how many people will miss the sarcasm. ;) Cloud means you don't need any ops people, right?
Speaking as someone who is deploying HA Kubernetes clusters on bare metal just for fun in just a few seconds I noticed something which made me stop reading this article all together.
Just deploy Rook and Ceph? ARE YOU BLEEPING KIDDING ME?!?
There's a job description called "Storage Engineer". These people know a little bit about Kubernetes, but are mostly specialized in everything Ceph. This tells you everything how hard it is to keep Ceph humming along in production. As a sidenote: if you want to make really good money there's also somebody called a "Ceph consultant" who is called in when SHTF. And if SHTF in a Ceph cluster, it really does.
And that's besides all the crap it takes to get and keep Kubernetes running smoothly: Kernel Optimization. Networking. Security. Storage integration. Observability. And the list goes on...
In other words, unless you are VERY well versed in a variety of topics ranging from server architecture to deep Linux knowledge and are knee deep in the usual day to day operations stuff already you are better off running Kubernetes in the cloud and leaving all the intricacies to the likes of Google, Microsoft and Amazon than trying to run a well designed cluster architecture yourself. It just isn't worth it.
IMO an article like this shouldn't just make the claim - it should show how to do it at the home lab level.
Documentation is out there and readily available. I have k8's in my homelab, a server rack of some modern-ish poweredges, fail over hypervisors, ansible books, etc. Just a single guy, not a team or anything. It's really up to the reader to go do it.
To be fair, even at a small scale something like k3s is usable and not that hard to run.
Also, isn't this the promise that k8s had from the beginning... that it would be the one cloud abstraction to rule them all?
> This autonomy is a superpower for small teams. We detailed the financial side of this journey in How moving from AWS to Bare-Metal saved us $230,000 /yr. The cultural unlock has been even bigger.
This doesn't seem to be aimed at homelab but small teams.
Pretty much just install talos and you're done. Deploy the services you need after that.
Then install the rest of the owl.
I mean yeah, unless you want a raven, or a hawk. Kubernetes is bare minimum included out of the box. It's very easy to add more services though.
https://knowyourmeme.com/memes/how-to-draw-an-owl
> it should show how to do it at the home lab level
I dont need to autoscale my home lab...
I want a better UI/DX/Interface than Kubernetes...
I need to be able to do things "by hand" as well as "automated" at home...
There is a reason that I use Proxmox at home. Because it is a joy to work with for the simple needs of my home lab.
[dead]
Been using Nomad and different sorts of K8S for the last 6+ years at home/work. Nomad is easier to bootstrap, lighter on resources and so so so much easier wrap your head around. Just Nomad + NFS server, from my perspective, is a perfect start for a homelab/small project. You can add complexity to it as you go. It is a real joy work with once you after a day of tinkering. Want to run on windows? Sure. BSD, there is a driver for it. Don't want Docker? There is Podman driver. OCI sucks? Just run binaries without isolation. Need VM's? You can switch from Proxmox to purely Nomad setup with Qemu driver with a bit of sweat. Illumos zones on OmniOS? weird, but with quite a bit of time, but there was a repo on github, just need to build the binary with the patch.
And while k8s can do all the same things and much more with a bit of trying, but it requires a mission control the second you add a second developer, you will have built-in primitives that will compete all the time with the ones you bolt-on, etc etc. Nomad feels much more opinionated and in a good way.
Nomad is one of those things that gets you 90% of the way with 20% of effort, and only then if you need something, you can add things to it. K8S is great, way more flexible, there are managed options out there, massive ecosystem, but it always feels like out of the box you need to glue 5 different tools to it, just get it going.
Also Incus. Stephane Graeber is doing lords work by sticking to his thing. That's also super fun to mess with.
In my experience Nomad is a much better system. People avoid it because they fear vendor lock-in, but the fact that Hashicorp controls it also means that it is well-designed and interoperates easily with other Hashicorp tools.
People also avoid it because it is a fringe system and there's much more knowledge and tools around Kubernetes.
You say that like it's a good thing https://www.netlify.com/v3/img/blog/cncf-landscape-map-2020....
I feel like articles like this need to come with a diagram like this to put it in the context of relevant tradeoffs.
Or something like that. Maybe as a function of time as well but that might be harder to do in 2d.Sure I can absolutely manage my own k8s, but there is no doubt it's easier for me to spin up postgres and ship faster on my own. At enterprise scale it's definitely a lot easier to do everything in k8s and be able to manage as many aspects as possible. I have experience of both.
Managed Ceph in the past. I cannot comprehend someone putting up with the headache that is Ceph in their home lab. To each their own!
I've used Ceph together with Proxmox VE excessively. No problems whatsoever.
And in related news, Proxmox VE is often probably a more sensible thing to use for a private cloud environment, because it is far more flexible and easier to use than Kubernetes.
The root cause here is just that managing any kind of storage service is instantly painful. The property of "not losing data" means that you are sort of required to always be doing something in order to keep it healthy.
For small setups it’s honestly fine with rook. For large ones yeah better dust off your ceph phd
as much as i'm glazing k8s in this thread I haven't managed to get ceph working. I wish it too since I dont want to use minio anymore.
Longhorn just kinda worked out of the box though with a couple kernel/system settings. No s3 api though.
But this isn't k8s fault out all.
Counter Argument to "HA HA, Kubernetes sucks": https://www.macchaffee.com/blog/2024/you-have-built-a-kubern...
Kubernetes is a rat nest and I long hope for Kubernetes be simpler (Who needs this Gateway API?) but Devs keep building crazy and crazy solutions so we have to pivot to keep up.
Kubernetes is powerful, yes. it is also a feckless rats nest of bolt-ons and ride-alongs. its sharepoint levels of byzantine tuning so complex that, like sharepoint, it comes with its own bespoke administrators that often have little or no knowledge of basic networking or operating systems --only kubernetes--.
- Upgrading a kubernetes cluster may as well be an olympic sport. its so draconian most best practice documentation insists you build a second cluster for AB deployment.
- load balancers come in half a dozen flavours, with the default options bolted at the hip to the cloud cartel. MetalLB is an option, but your admin doesnt understand subnets let alone BGP.
- It is infested with the cult of immutability. pod not working? destroy it. network traffic acting up? destroy the node. container not working? time to destroy it. cluster down? rebuilt the entire thing. At no point does the "devops practitioner" stop to consider why or how a thing of kubernetes has betrayed them. it is assumed you have a football field of fresh bare metal to reinitialize everything onto at a moments notice, failure modes be damned.
what your company likely needs is some implementation of libvirtd or proxmox. run your workloads on rootless podman or (god forbid) deploy to a single VM.
> what your company likely needs is some implementation of libvirtd or proxmox. run your workloads on rootless podman or (god forbid) deploy to a single VM.
Even with a single VM, someone's company probably will also want a reverse proxy and certificate management (assuming web services), automated deployments, provide secrets to services, storage volumes, health checks with auto restarts, ability to wire logs and metrics to some type of monitoring system, etc. All of this is possible with scripts and config management tools but now complexity is being managed in different ways. Alternatively use K3s and Flux to end up with a solution that checks all of those boxes while also having the option to use k8s manifests in public clouds.
I preface this with saying that my experience is all low/medium traffic and single cluster, and I've never had to develop for Kubernetes. But as a sysadmin, I don't mind it at all. I started a new job learned through being given logins to for AWS EKS mobile/web application backend and an on-prem OpenShift for an internal application. The the whole thing mostly works logically once you understand the underlying concepts, and the documentation is pretty good. The only issues I ever had that required external assistance were platform specific quirks (like EKS ALB annotations most recently). Even moved several of our single server workloads over.
I dont have any of this experience. I only have to change the version number and the upgrades roll themselves out.
MetalLB is good yes, and admins should have IP knowledge. I ask this in interview questions.
Yes, sheep not pets is the term here. Self healing is wonderful. There's plenty to dig into if you run into the same problem repeatedly. Being able to yank a node out that's misbehaving is very nice from a maintenance pov.
Talos on bare metal to get kubernetes features is pretty good. That's what my homelab is. I hated managing VMs before that.
Nix manages to be immutable without restarting everything from scratch.
The complaint isn't immutability, the complaint is that k8s does immutability is a broken, way too granular fashion.
I'm not really clear on the complaint. Is it immutability or not? I'm not saying delete the cluster and start over, I'm saying i can yank a node or destroy a container without (much of) a consequence. Talos is immutable similarly to nix afaik
I guess the complaint is that with resources being immutable, the only standard & recommended way to deal with a problem is to take the resource out.
I know that is the whole point of sheep vs pets but it somehow became the "did you restart the pc" version for operations.
There's only small parts of the typically used parts of the kubernetes api that are immutable and those have good reasons. So I'm still not really sure what issue you're describing.
> MetalLB is an option, but your admin doesnt understand subnets let alone BGP
Maybe get someone competent then? Why are you tasking running onprem setup someone who doesn’t understand basic networking?
To be fair, BGP is definitely beyond "basic" networking. There is a learning curve.
The subset of it you need to grok to setup metallb is in fact pretty basic
I actually do agree. I've setup MetalLB in my home lab and it was super simple. I've doing networking since the 90's though, both professionally and as a hobby where I operate my own AS for fun, so I can see how someone else could be intimidated.
Odd, we went from a bare instance to VM to a tiny k8s cluster, and k8s was the most stable and easy to administer of the lot.
It is infested with the cult of immutability
Immutability is like violence: if it doesn't solve your problem, you aren't using enough of it.
As someone who self hosted bare metal Kubernetes on my own rack, it's a lot of work to get it set up. We used RedHat Openshift which has a pretty good solution out of the box, but the learning curve was relatively high.
That being said, once it was set up, there was not a lot of maintenance. Kubernetes is quire resilient when set up properly, and the cost savings were significant.
Try talos next time. It took minutes to setup. Red Hat docs and product names scare me since they are intentionally obtuse. I thought I wanted openshift but no way i'm paying and i couldn't figure out how to even get started. Talos was such a breeze.
> because everyone else seemed to do it
This is not a good reason to have done it. To me this means that the expectations and outcomes were flawed as they are solving a problem that shouldn't have existed. I can't really agree with the sentiment or overview of this post
It'd be easier to ask Kubernetes fans what Kubernetes isn't at this point.
As long as you have someone to babysit your cluster.
great post but maybe missing some things, like configuring load balancers with metallb, CNI's like Calico, how you can get your own address space from ARIN for IPv6, and API gateways
There is a fundamental misunderstanding people have about technology that you only get from operating and maintaining it. There's what something can do, and then there's actually doing it.
K8s is complicated as hell to learn to use. Its learning curve is very shallow. Yes, you can get a "hello world" running quickly, but that is not the benchmark for whether you actually understand what's going on, or how to make it do what you need.
But once you do learn it thoroughly, it's ridiculously fast to ramp up on it and use it for very complex things, very quickly. As a developer medium, as an operational medium, it accelerates the kind of modern practices (that for some reason most people still don't know about?) that can produce a lot of value.
But that's if someone else is building and maintaining the underlying "guts" of it for you. If you try to build it from scratch and maintain it on bare metal, suddenly it's incredibly complicated again with another shallow learning curve and a lot of effort. And that effort continues, because they keep changing the fucking thing every 6-12 months...
It's like learning to drive a car or ride a bike. At first it's difficult and time-consuming. But then you get it, and you can ride around pretty easily, and it can take you far. However, that does not mean you understand car/bike mechanics, or can take it apart and rebuild it when something breaks. So be prepared to pay someone for that maintenance, or be prepared to become a mechanic (and do that work while also doing your day job). This analogy is stretched thinner by the fact that nobody's constantly changing how your car works...
No, what I need is NixOS, a configuration with a language that's a bit hard to digest but effective, which I can build and read, allowing me to replicate my infrastructure, create custom ISOs etc, in an almost totally automatic and manageable way on domestic hardware at a domestic scale.
What's needed isn't a rambling YAML and immense resource consumption; it's IaC, built-in to the system, that can do what's necessary not for an IT giant that lives off others' services run in-house, but for me, a private citizen with just a few of my own services, little time to manage them, a need to quickly replicate the infrastructure because I don't have infinite data centers, so if the homeserver dies, I need to buy another cheap desktop and set it up, restoring it on the fly. So if I'm offline for a few hours, nothing happens, but hardware costs money, so I need to use it well, and so on.
The giants' solutions are not one-size-fits-all.
Is this a joke? It's like the kind of article about kubernetes that we would have seen 10 years ago. Especially some of the ridiculous claims like you can run every service on your local k8s that you could run in the cloud. No, building managed service equivalents to run locally is not trivial.
Can't wait for k8s hype to go the way of microservices.
One perpetuates the other. If you have k8s, you are tempted to use microservice architecture and if you have microservices then k8s makes them look manageable.
I suspect both of them will go down together if/when they do.
It won't, they'll just come up with a way to abstract it away by adding yet another layer to the DevOps stacks. Maybe something with AI.
I don't think kubernetes is inherently bad... it's just a tool that engineers are about 10x as likely to use as a footgun than as a nailgun.
Now you have two problems: kubernetes and your private cloud. The second being that you decided you needed "cloud" to start with.
You do not need kubernetes