blog/src/routes/_posts/kubernetes-alternative-wishes.svx

82 lines
11 KiB
Plaintext

---
title: The Kubernetes Alternative I Wish Existed
date: 2023-10-01
draft: true
---
<script>
import Sidenote from '$lib/Sidenote.svelte';
</script>
I use Kubernetes on my personal server, largely because I wanted to get some experience working with it. It's certainly been helpful in that regard, but after a year and a half or so I think I can pretty confidently say that it's not the ideal tool for my use-case. Duh, I guess? But I think it's worth talking about _why_ that's the case, and what exactly _would_ be the ieal tool.
## The Kubernetes Way™
Kubernetes is a very intrusive orchestration system. It would very much like the apps you're running to be doing things _its_ way, and although that's not a _hard_ requirement it tends to make everything subtly more difficult when that isn't the case. In particular, Kubernetes is targeting the situation where you:
* Have a broad variety of applications that you want to support,
* Have written all or most of those applications yourself,<Sidenote>The "you" here is organizational, not personal.</Sidenote>
* Need those applications to operate at massive scale, e.g. concurrent users in the millions.
That's great if you're Google, and surprise! Kubernetes is a largely Google-originated project,<Sidenote>I'm told that it's a derivative of Borg, Google's in-house orchestration platform.</Sidenote> but it's absolute _garbage_ if you (like me) are just self-hosting apps for your own personal use and enjoyment. It's garbage because, while you still want to support a broad variety of applications, you typically _didn't_ write them yourself and you _most definitely don't_ need to scale to millions of concurrent users. More particularly, this means that the Kubernetes approach of expecting everything to be aware that it's running in Kubernetes and make use of the platform (via cluster roles, CRD's etc) is very much _not_ going to fly. Instead, you want your orchestration platform to be as absolutely transparent as possible: ideally, a running application should need to behave no differently in this hypothetical self-hosting-focused orchestration system than it would if it were running by itself on a Raspberry Pi in your garage. _Most especially_, all the distributed-systems crap that Kubernetes forces on you is pretty much unnecessary, because you don't need to support millions<Sidenote>In fact, typically your number of concurrent users is going to be either 1 or 0.</Sidenote> of concurrent users, and you don't care if you incur a little downtime when the application needs to be upgraded or whatever.
## But Wait
So then why do you need an orchestration platform at all? Why not just use something like [Harbormaster](https://gitlab.com/stavros/harbormaster) and call it a day? That's a valid question, and maybe you don't! In fact, it's quite likely that you don't - orchestration platforms really only make sense when you want to distribute your workload across multiple physical servers, so if you only have the one then why bother? However, I can still think of a couple of reasons why you'd want a cluster even for your personal stuff:
* You don't want everything you host to become completely unavailable if you bork up your server somehow. Yes, I did say above that you can tolerate some downtime, and that's still true - but especially if you like tinkering around with low-level stuff like filesystems and networking, it's quite possible that you'll break things badly enough<Sidenote>And be sufficiently busy with other things, given that we're assuming this is just a hobby for you.</Sidenote> that it will be days or weeks before you can find the time to fix them. If you have multiple servers to which the workloads can migrate while one is down, that problem goes away.
* You don't want to shell out up front for something hefty enough to run All Your Apps, especially as you add more down the road. Maybe you're starting out with a Raspberry pi, and when that becomes insufficient you'd like to just add more Pis rather than putting together a beefy machine with enough RAM to feed your [Paperless](https://github.com/paperless-ngx/paperless-ngx) installation, your [UniFi controller](https://help.ui.com/hc/en-us/articles/360012282453-Self-Hosting-a-UniFi-Network-Server), your Minecraft server(s), and your [Matrix](https://matrix.org) server.
Okay, sure, maybe this is still a bit niche. But you know what? This is my blog, so I get to be unrealistic if I want to.
## So what's different?
Our hypothetical orchestrator system starts out in the same place as Kubernetes--you have a bunch of containerized applications that need to be run, and a pile of physical servers on which you'd like to run them. You want to be able to specify at a high level in what ways things should run, and how many of them, and so on. You don't want to worry about the fiddly details like deciding which container goes on which host, or manually moving all of `odin`'s containers to `thor` when the Roomba runs over `odin`'s power cable while you're on vacation on the other side of the country.
So that much is the same. But we're going to do everything else differently.
Where Kubernetes is intrusive, we want to be transparent. Where Kubernetes is flexible and pluggable, we will be opinionated. Where Kubernetes wants to proliferate statelessness and distributed-systems-ism, we will be perfectly content with stateful monotliths.<Sidenote>And smaller things, too. Microliths?</Sidenote> Where Kubernetes expects cattle, we will accept pets. And so on.
## The Goods
### Docker-Image Based
It's 2023 and the world has more or less decided on Docker<Sidenote>I know we're supposed to call them "OCI Images" now, but they'll always be Docker images to me. Docker started them, Docker popularized them, and then Docker died because it couldn't figure out how to monetize an infrastructure/tooling product. The least we can do is honor its memory by keeping the name alive.</Sidenote> images as the preferred format for packaging server applications. Are they efficient? Hell no. Are they annoying and fiddly, with plenty of [hidden footguns](https://danaepp.com/finding-api-secrets-in-hidden-layers-within-docker-containers)? You bet. But they _work_, and they've massively simplified the process of getting a server application up and running. As someone who has had to administer a Magento 2 installation, it's hard not to find that appealing.
They're especially attractive to the self-hosting-ly inclined, because a well-maintained Docker image tends to keep _itself_ up to date with a bare minimum of automation. I know "automatic updates" are anathema to some, but remember, we're talking self-hosted stuff here--sure, the occasional upgrade may break your Gitea<Sidenote>Actually, probably not. I've been running Gitea for years now and never had a blip.</Sidenote> server, but I can almost guarantee that you'll spend less time fixing that than you would have manually applying every update to every app you ever wantedt to host, forever.
So our hypothetical orchestrator is going to use Docker images. But there's a complication: It can't use Docker to run them, or even the lower-level components like `containerd` or `cri-o`, because it's going to be doing it all with...
### Firecracker
You didn't write all these apps yourself, and you don't trust them any further than you can throw them. Containers are great and all, but you'd like a little more organization. Enter Firecracker. This does add some complexity where resource management is concerned, especially memory, since by default Firecracker wants you to allocate everything up front. But maybe that's ok, or maybe we can build in some [ballooning](https://github.com/firecracker-microvm/firecracker/blob/main/docs/ballooning.md) to keep things under control.
Now, since we're running Docker images in Firecracker containers, we're going to need a method for converting Docker images _into_ Firecracker containers. Particularly we're going to need to convert a Docker image to a Firecracker rootfs, which is [definitely doable](https://fly.io/blog/docker-without-docker/) but not _completely_ trivial.
### Networking
Locked-down by default. You don't trust these apps, so they don't get access to the soft underbelly of your LAN. So it's principle-of-least-privilege all the way. Ideally it should be possible when specifying a new app that it gets network access to an existing app, rather than having to go back and modify the existing one.
### Storage
Kubernetes tends to work best with stateless applications. It's not entirely devoid of [tools](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) for dealing with state, but state requires persistent storage and persistent storage is hard in clusters. I get the sense that for a long time you were almost completely on your own here, although recent options (Longhorn) are improving the situation.
Regardless, we're selfhosting here, which means virtually _everything_ has state. But fear not! Distributed state is hard, yes, but most of our apps aren't going to be truly distributed. That is, typically there's only going to be one instance running at a time, and it's acceptable to shut down the existing instance before spinning up a new one. So this problem becomes a lot more tractable.
* Asynchronous replication
* Single-writer, multi-reader
* Does this exist?
### Configuration
YAML, probably? It's fashionable to hate on YAML right now, but I've always found it rather pleasant.<Sidenote>Maybe people hate it because their primary experience of using it has been in Kubernetes manifests, which, fair enough.</Sidenote> JSON is out because no comments. TOML is out because nesting sucks. Weird niche supersets of JSON like HuJSON and JSON5 are out because they've been around long enough that if they were going to catch on, they would have by now. Docker Swarm config files<Sidenote>which are basically just Compose files with a few extra bits.</Sidenote> are my exemplar par excellence here. (comparison of Kubernetes and Swarm YAML?) (Of course they are, DX has always been Docker's Thing.)
We are also _definitely_ going to eschew the Kubernetes model of exposing implementation details in the name of extensibility.<Sidenote>See: ReplicaSets, EndpointSlices. There's no reason for these to be first-class API resources like Deployments or Secrets, other than to enable extensibility. You never want users creating EndpointSlices manually, but you might (if you're Kubernetes) want to allow an "operator" service to fiddle with them, so you make them first-class resources because you have no concept of the distinction between external and internal APIs.</Sidenote>
### Workload Grouping
It's always struck me as odd that Kubernetes doesn't have a native concept for a heterogenous grouping of pods. Maybe it's because Kubernetes assumes it's being used to deploy mostly microservices, which are typically managed by independent teams--so workloads that are independent but in a provider/consumer relationship are being managed by different people, probably in different cluster namespaces anyway, so why bother trying to group them?
Regardless, I think Nomad gets this exactly right with the job/group/task hierarchy. I'd like to just copy that wholesale, but with more network isolation.