Recently I've been getting annoyed with the Debian Administration website; too often it would be slower than it should be considering the resources behind it.
As a brief recap I have six nodes:
- 1 x MySQL Database - The only MySQL database I personally manage these days.
- 4 x Web Nodes.
- 1 x Misc server.
The misc server is designed to display events. There is a node.js listener which receives UDP messages and stores them in a rotating buffer. The messages might contain things like "User bob logged in", "Slaughter ran", etc. It's a neat hack which gives a good feeling of what is going on cluster-wide.
I need to rationalize that code - but there's a very simple predecessor posted on github for the curious.
Anyway enough diversions, the database is tuned, and "small". The misc server is almost entirely irrelevent, non-public, and not explicitly advertised.
So what do the web nodes run? Well they run a lot. Potentially.
Each web node has four services configured:
- Apache 2.x - All nodes.
- uCarp - All nodes.
- Pound - Master node.
- Varnish - Master node.
Apache runs the main site, listening on *:8080.
One of the nodes will be special and will claim a virtual IP provided via ucarp. The virtual IP is actually the end-point visitors hit, meaning we have:
Pound is configured to listen on the virtual IP and perform SSL termination. That means that incoming requests get proxied from "vip:443 -> vip:80". Varnish listens on "vip:80" and proxies to the back-end apache instances.
The end result should be high availability. In the typical case all four servers are alive, and all is well.
If one server dies, and it is not the master, then it will simply be dropped as a valid back-end. If a single server dies and it is the master then a new one will appear, thanks to the magic of ucarp, and the remaining three will be used as expected.
I'm sure there is a pathological case when all four hosts die, and at that point the site will be down, but that's something that should be atypical.
Yes, I am prone to over-engineering. The site doesn't have any availability requirements that justify this setup, but it is good to experiment and learn things.
So, with this setup in mind, with incoming requests (on average) being divided at random onto one of four hosts, why is the damn thing so slow?
We'll come back to that in the next post.
(Good news though; I fixed it ;)