Drone/Woodpecker autoscaling on Hetzner

Since the drone.io official autoscaling doesn’t work properly, and isn’t even open source, I implemented my own, in Rust of course.

If you just want to skip the why and how and get it, feel free to jump to the source code repo.

The why

I want my infrastructure to be optimized to not cost a lot, whilst still being enough for my use cases. The issue that arose from those two requirements was that compiling rust code in my CI system is an use case I wanted to support. It’s an issue because compiling rust takes quite a bit of resources, but it’s not often that I actually need those resources.

I could have upgraded my constantly running servers to be able to handle the extra resource usage. If I did that though, I’d be constantly paying more for them though, even if they go unused for weeks when I don’t touch any rust code.

Thusly I looked into the official auto scaler, and the various VM runners that Drone docs mentioned. To me it seemed though like the Drone docs were quite severely out of date, and that the actual Open Source version didn’t properly support any autoscaling. So I did the only unreasonable thing, and made my own quick implementation in rust.

The how

The how can be split into two parts:

Figuring out the state of the CI builds
Managing the servers

The rest in the middle is just the business logic of when to deploy and delete the servers.

Getting builds status from drone

Drone doesn’t seem to document it’s APIs neatly anywhere. But since it’s open source, I just digged around the source code to find the stats endpoint.

The information it offers is very minimal, as can be seen from this response snippet:

1
2
3
4
5
6
7
{
  "builds": {
    "pending": 2,
    "running": 0,
    "total": 550
  }
}

But it’s at least enough to know when there’s a demand for servers, and when that demand has completed. But it doesn’t for example tell us what kind of runners there’s demand for, or what servers are currently executing builds. Thusly our scaler will be quite limited, but that’s fine for the scope that was outlined in the beginning.

When I migrated to Woodpecker from Drone and added support for it, I used Woodpecker’s Prometheus endpoint instead, which exposed the same details.

Managing the servers

Now, the choice of Hertzner was due to them having a properly documented API, and cheap servers in the same country as my own infrastructure. Their billing is also hourly, which is suitable enough for this. There are of course alternatives like digital ocean’s droplets, but they generally were more expensive and weren’t as close by.

The actual creation of the Herzner servers was as simple as just making a simple REST request, with a specific userdata field. The userdata is passed to cloud-init, which allows running our own startup command. In the autoscaler glue code, we just generate a suitable runcmd to start the docker runner with it pointed to the Drone management server.

Summary

With all that, setting up an auto scaler that scales up to a few servers and down to zero was quite easy. I implemented the code in Rust of course, using only very minimal dependencies, instead of using the pre-existing Hertzner API crate. Thanks to that, the resulting container image is only 1.69MB, which is very nice :‘D

Additionally in my CI I added automation to package and publish it into the docker image. The rest of my infrastructure updates the containers via watchtower, so any future improvements will automatically be deployed as well now.

If you’re interested in setting it up for yourself, see the source code repo for more info.