I'm slowly planning the redesign of the cluster which powers the Debian Administration
Currently the design is simple, and looks like this:
In brief there is a load-balancer that handles SSL-termination and
then proxies to one of four Apache servers. These talk back and forth
to a MySQL database. Nothing too shocking, or unusual.
(In truth there are two database servers, and rather than a single
installation of HAProxy it runs upon each of the webservers - One is
the master which is
handled via ucarp. Logically though traffic routes through
HAProxy to a number of Apache instances. I can lose half of the
servers and things still keep running.)
When I setup the site it all ran on one host, it was simpler, it
was less highly available. It also struggled to cope with the
Half the reason for writing/hosting the site in the first place was
to document learning experiences though, so when it came to time to
make it scale I figured why not learn something and do it
neatly? Having it run on cheap and
reliable virtual hosts was a good excuse to bump the server-count
and the design has been stable for the past few years.
Recently though I've begun planning how it will be deployed in the
future and I have a new design:
Rather than having the Apache instances talk to the database I'll
indirect through an API-server. The API server will handle requests
- POST /users/login
- POST a username/password and return 200 if valid. If bogus
details return 403. If the user doesn't exist return 404.
- GET /users/Steve
- Return a JSON hash of user-information.
- Return 404 on invalid user.
I expect to have four API handler endpoints: /articles,
/comments, /users & /weblogs. Again we'll use a
floating IP and a HAProxy instance to route to multiple API-servers.
Each of which will use local caching to cache articles, etc.
This should turn the middle layer, running on Apache, into simpler
things, and increase throughput. I suspect, but haven't confirmed,
that making a single HTTP-request to fetch a (formatted) article body
will be cheaper than making N-database queries.
Anyway that's what I'm slowly pondering and working on at the
moment. I wrote a proof of
concept API-server based CMS two years ago, and my recollection of
that time is that it was fast to develop, and easy to scale.
Tags: cluster, debian-administration, haproxy
29 February 2016 21:50
Recently I had a conversation with a programmer who repeated the adage
that programming in perl consists of writing line-noise. This isn't true
but it reminded me of my love of fuzzers. Fuzzers are often used to
generate random input files which are fed to tools, looking for security
problems, segfaults, and similar hilarity.
To the untrained eye the output of most fuzzers is essentially
line-noise, since you often start with a valid input file and start
flipping bits, swapping bytes, and appending garbage.
Anyway this made me wonder what happens if you fed random garbage into a perl
interpreter? I wasn't brave enough to try it, because knowing my luck
the fuzzer would write a program like so:
system( "rm -rf /home/steve" );
But I figured it was still an interesting idea, and I could have a go at
fuzzing something else. I picked
gawk, the GNU implementation of
awk because the codebase is pretty small, and I understand it
Almost immediately my fuzzer found some interesting segfaults and
problems. Here's a nice simple example:
$ gawk 'for (i = ) in steve kemp rocks'
gawk: cmd. line:1: fatal error: internal error: segfault
I look forward to seeing what happens when other people fuzz perl..
Tags: fun, fuzzing, gawk