Compare commits

...

2 Commits

2 changed files with 62 additions and 0 deletions

View File

@ -30,12 +30,34 @@ So then why do you need an orchestration platform at all? Why not just use somet
Okay, sure, maybe this is still a bit niche. But you know what? This is my blog, so I get to be unrealistic if I want to.
## So what's different?
Our hypothetical orchestrator system starts out in the same place as Kubernetes--you have a bunch of containerized applications that need to be run, and a pile of physical servers on which you'd like to run them. You want to be able to specify at a high level in what ways things should run, and how many of them, and so on. You don't want to worry about the fiddly details like deciding which container goes on which host, or manually moving all of `odin`'s containers to `thor` when the Roomba runs over `odin`'s power cable while you're on vacation on the other side of the country.
So that much is the same. But we're going to do everything else differently.
Where Kubernetes is intrusive, we want to be transparent. Where Kubernetes is flexible and pluggable, we will be opinionated. Where Kubernetes wants to proliferate statelessness and distributed-systems-ism, we will be perfectly content with stateful monotliths.<Sidenote>And smaller things, too. Microliths?</Sidenote> Where Kubernetes expects cattle, we will accept pets. And so on.
## The Goods
### Docker-Image Based
It's 2023 and the world has more or less decided on Docker<Sidenote>I know we're supposed to call them "OCI Images" now, but they'll always be Docker images to me. Docker started them, Docker popularized them, and then Docker died because it couldn't figure out how to monetize an infrastructure/tooling product. The least we can do is honor its memory by keeping the name alive.</Sidenote> images as the preferred format for packaging server applications. Are they efficient? Hell no. Are they annoying and fiddly, with plenty of [hidden footguns](https://danaepp.com/finding-api-secrets-in-hidden-layers-within-docker-containers)? You bet. But they _work_, and they've massively simplified the process of getting a server application up and running. As someone who has had to administer a Magento 2 installation, it's hard not to find that appealing.
They're especially attractive to the self-hosting-ly inclined, because a well-maintained Docker image tends to keep _itself_ up to date with a bare minimum of automation. I know "automatic updates" are anathema to some, but remember, we're talking self-hosted stuff here--sure, the occasional upgrade may break your Gitea<Sidenote>Actually, probably not. I've been running Gitea for years now and never had a blip.</Sidenote> server, but I can almost guarantee that you'll spend less time fixing that than you would have manually applying every update to every app you ever wantedt to host, forever.
So our hypothetical orchestrator is going to use Docker images. But there's a complication: It can't use Docker to run them, or even the lower-level components like `containerd` or `cri-o`, because it's going to be doing it all with...
### Firecracker
You didn't write all these apps yourself, and you don't trust them any further than you can throw them. Containers are great and all, but you'd like a little more organization. Enter Firecracker. This does add some complexity where resource management is concerned, especially memory, since by default Firecracker wants you to allocate everything up front. But maybe that's ok, or maybe we can build in some [ballooning](https://github.com/firecracker-microvm/firecracker/blob/main/docs/ballooning.md) to keep things under control.
Now, since we're running Docker images in Firecracker containers, we're going to need a method for converting Docker images _into_ Firecracker containers. Particularly we're going to need to convert a Docker image to a Firecracker rootfs, which is [definitely doable](https://fly.io/blog/docker-without-docker/) but not _completely_ trivial.
### Networking
Locked-down by default. You don't trust these apps, so they don't get access to the soft underbelly of your LAN. So it's principle-of-least-privilege all the way. Ideally it should be possible when specifying a new app that it gets network access to an existing app, rather than having to go back and modify the existing one.
### Storage
Kubernetes tends to work best with stateless applications. It's not entirely devoid of [tools](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) for dealing with state, but state requires persistent storage and persistent storage is hard in clusters. I get the sense that for a long time you were almost completely on your own here, although recent options (Longhorn) are improving the situation.
@ -45,3 +67,15 @@ Regardless, we're selfhosting here, which means virtually _everything_ has state
* Asynchronous replication
* Single-writer, multi-reader
* Does this exist?
### Configuration
YAML, probably? It's fashionable to hate on YAML right now, but I've always found it rather pleasant.<Sidenote>Maybe people hate it because their primary experience of using it has been in Kubernetes manifests, which, fair enough.</Sidenote> JSON is out because no comments. TOML is out because nesting sucks. Weird niche supersets of JSON like HuJSON and JSON5 are out because they've been around long enough that if they were going to catch on, they would have by now. Docker Swarm config files<Sidenote>which are basically just Compose files with a few extra bits.</Sidenote> are my exemplar par excellence here. (comparison of Kubernetes and Swarm YAML?) (Of course they are, DX has always been Docker's Thing.)
We are also _definitely_ going to eschew the Kubernetes model of exposing implementation details in the name of extensibility.<Sidenote>See: ReplicaSets, EndpointSlices. There's no reason for these to be first-class API resources like Deployments or Secrets, other than to enable extensibility. You never want users creating EndpointSlices manually, but you might (if you're Kubernetes) want to allow an "operator" service to fiddle with them, so you make them first-class resources because you have no concept of the distinction between external and internal APIs.</Sidenote>
### Workload Grouping
It's always struck me as odd that Kubernetes doesn't have a native concept for a heterogenous grouping of pods. Maybe it's because Kubernetes assumes it's being used to deploy mostly microservices, which are typically managed by independent teams--so workloads that are independent but in a provider/consumer relationship are being managed by different people, probably in different cluster namespaces anyway, so why bother trying to group them?
Regardless, I think Nomad gets this exactly right with the job/group/task hierarchy. I'd like to just copy that wholesale, but with more network isolation.

View File

@ -0,0 +1,28 @@
---
title: Password Strength, Hackers, and You
date: 2023-10-21
draft: true
---
<script>
import Sidenote from '$lib/Sidenote.svelte';
</script>
Every once in a while, as my friends and family can attest, I go off on a random screed about passwords, password strength, password cracking, logins, etc. To which they listen with polite-if-increasingly-glassy-eyed expressions, followed by an equally polite change of conversational topic. To avoid falling into this conversational tarpit _quite_ so often, I've decided to write it all up here, so that instead of spewing it into an unsuspecting interlocutor's face I can simply link them here.<Sidenote>Maybe I can get business cards printed, or something.</Sidenote> Whereupon they can say "Thanks, that sounds interesting," and proceed to forget that it ever existed. So it's a win-win: I get to feel like I've Made A Difference, and they don't have to listen to a half-hour of only-marginally-interesting infosec jargon.
So.
## Password Strength
Everyone knows that the "best" password is at least 27 characters long and contains both uppercase and lowercase letters, numbers, a symbol or two, at least one ~~typographical miscue~~, and at least one letter from the ancient Sanskrit, Egyptian, or Sumerian alphabet. What may be slightly less known is exactly _why_ this is the recommended approach to picking passwords, and how the same goal might be accomplished by other, less eye-gougingly awful means.
So what makes a "strong" password? Most people have a pretty good intuition for this, I think: A strong password is one that can't be easily guessed. The absolute _worst_ password is something that might be guessed by someone who knows nothing at all about you, such as `password` or `123456`<Sidenote>This is, in fact, the most common password (or was last I checked), according to [Pwned Passwords](https://haveibeenpwned.com/passwords).</Sidenote> Only slightly stronger is a password that's obvious to anyone who knows the slightest bit about its circumstances, such as your first name or the name of the site/service/etc. to which it logs you in.
Ok, so it's pretty clear what makes a _really_ bad password. But what about an only-sort-of-bad password? This is where intuition starts to veer off the rails a little bit, I think. The "guessability" of a password might be quantified as "how long, on average, would it take to guess"? Unfortuantely, the intuitive situation of "guessing" a password is pretty divergent from the reality of what a password cracker is actually doing when they try to crack passwords. Most people, based on the conversations I've had, envision "password guessing" as someone sitting at a computer, typing in potential passwords one by one. Or, maybe slightly more sophisticatedly, they imagine a computer firing off attempted logins from a list of potential passwords, but critically, _against the live system that is under attack._ This is a problem, because most password cracking (at least, the kind you have to worry about) _doesn't_ take place against live login pages. Instead, it happens in what's known as an "offline" attack, when the password cracker has managed to obtain a copy of the password database and starts testing various candidates against it. To explain this, though, we have to take a little detour into...
## Password storage
Unless the system in question is hopelessly insecure (and there are such systems; we'll talk about that in a bit) it doesn't store a copy of your password in plain text. Instead it stores what's called a _hash_, which is what you get when you run the password through a particular type of data-munging process called a _hashing algorithm_. A good password hashing algorithm has two key properties that make it perfect for this use case: It's _non-reversible_, and it's _computationally expensive_.
### One-way hashing
Suppose your password is `password`, and its hash is something like `X03MO1qnZdYdgyfeuILPmQ`. The non-reversibility of the hashing algorithm means that given the second value, there isn't any direct way to derive the first again. The only way to figure it out is to, essentially, guess-and-check against a list of potential candidate inputs. If that sounds a little bit like black magic, don't worry - I felt the same way when I first encountered the concept. How can a hash be irreversible _even if you know the algorithm_?