About Archive Tags RSS Feed

 

Entries tagged yawns

Rebooting the CMS

9 August 2014 21:50

I run a cluster for the Debian Administration website, and the code is starting to show its age. Unfortunately the code is not so modern, and has evolved a lot of baggage.

Given the relatively clean separation between the logical components I'm interested in trying something new. In brief the current codebase allows:

  • Posting of articles, blog-entries, and polls.
  • The manipulation of the same.
  • User-account management.

It crossed my mind the other night that it might make sense to break this code down into a number of mini-servers - a server to handle all article-related things, a server to handle all poll-related things, etc.

If we have a JSON endpoint that will allow:

  • GET /article/32
  • POST /article/ [create]
  • GET /articles/offset/number [get the most recent]

Then we could have a very thin shim/server on top of that whihc would present the public API. Of course the internal HTTP overhead might make this unworkable, but it is an interesting approach to the problem, and would allow the backend storage to be migrated in the future without too much difficulty.

At the moment I've coded up two trivial servers, one for getting user-data (to allow login requests to succeed), and one for getting article data.

There is a tiny presentation server written to use those back-end servers and it seems like an approach that might work. Of course deployment might be a pain..

It is still an experiment rather than a plan, but it could work out: http://github.com/skx/snooze/.

| No comments

 

Updating Debian Administration

21 August 2014 21:50

Recently I've been getting annoyed with the Debian Administration website; too often it would be slower than it should be considering the resources behind it.

As a brief recap I have six nodes:

  • 1 x MySQL Database - The only MySQL database I personally manage these days.
  • 4 x Web Nodes.
  • 1 x Misc server.

The misc server is designed to display events. There is a node.js listener which receives UDP messages and stores them in a rotating buffer. The messages might contain things like "User bob logged in", "Slaughter ran", etc. It's a neat hack which gives a good feeling of what is going on cluster-wide.

I need to rationalize that code - but there's a very simple predecessor posted on github for the curious.

Anyway enough diversions, the database is tuned, and "small". The misc server is almost entirely irrelevent, non-public, and not explicitly advertised.

So what do the web nodes run? Well they run a lot. Potentially.

Each web node has four services configured:

  • Apache 2.x - All nodes.
  • uCarp - All nodes.
  • Pound - Master node.
  • Varnish - Master node.

Apache runs the main site, listening on *:8080.

One of the nodes will be special and will claim a virtual IP provided via ucarp. The virtual IP is actually the end-point visitors hit, meaning we have:

Master HostOther hosts

Running:

  • Apache.
  • Pound.
  • Varnish

Running:

  • Apache.

Pound is configured to listen on the virtual IP and perform SSL termination. That means that incoming requests get proxied from "vip:443 -> vip:80". Varnish listens on "vip:80" and proxies to the back-end apache instances.

The end result should be high availability. In the typical case all four servers are alive, and all is well.

If one server dies, and it is not the master, then it will simply be dropped as a valid back-end. If a single server dies and it is the master then a new one will appear, thanks to the magic of ucarp, and the remaining three will be used as expected.

I'm sure there is a pathological case when all four hosts die, and at that point the site will be down, but that's something that should be atypical.

Yes, I am prone to over-engineering. The site doesn't have any availability requirements that justify this setup, but it is good to experiment and learn things.

So, with this setup in mind, with incoming requests (on average) being divided at random onto one of four hosts, why is the damn thing so slow?

We'll come back to that in the next post.

(Good news though; I fixed it ;)

| 4 comments

 

Updating Debian Administration, the code

23 August 2014 21:50

So I previously talked about the setup behind Debian Administration, and my complaints about the slownes.

The previous post talked about the logical setup, and the hardware. This post talks about the more interesting thing. The code.

The code behind the site was originally written by Denny De La Haye. I found it and reworked it a lot, most obviously adding structure and test cases.

Once I did that the early version of the site was born.

Later my version became the official version, as when Denny setup Police State UK he used my codebase rather than his.

So the code huh? Well as you might expect it is written in Perl. There used to be this layout:

yawns/cgi-bin/index.cgi
yawns/cgi-bin/Pages.pl
yawns/lib/...
yawns/htdocs/

Almost every request would hit the index.cgi script, which would parse the request and return the appropriate output via the standard CGI interface.

How did it know what you wanted? Well sometimes there would be a paramater set which would be looked up in a dispatch-table:

/cgi-bin/index.cgi?article=40         - Show article 40
/cgi-bin/index.cgi?view_user=Steve    - Show the user Steve
/cgi-bin/index.cgi?recent_comments=10 - Show the most recent comments.

Over time the code became hard to update because there was no consistency, and over time the site became slow because this is not a quick setup. Spiders, bots, and just average users would cause a lot of perl processes to run.

So? What did I do? I moved the thing to using FastCGI, which avoids the cost of forking Perl and loading (100k+) the code.

Unfortunately this required a bit of work because all the parameter handling was messy and caused issues if I just renamed index.cgi -> index.fcgi. The most obvious solution was to use one parameter, globally, to specify the requested mode of operation.

Hang on? One parameter to control the page requested? A persistant environment? What does that remind me of? Yes. CGI::Application.

I started small, and pulled some of the code out of index.cgi + Pages.pl, and over into a dedicated CGI::Application class:

  • Application::Feeds - Called via /cgi-bin/f.fcgi.
  • Application::Ajax - Called via /cgi-bin/a.fcgi.

So now every part of the site that is called by Ajax has one persistent handler, and every part of the site which returns RSS feeds has another.

I had some fun setting up the sessions to match those created by the old stuff, but I quickly made it work, as this example shows:

The final job was the biggest, moving all the other (non-feed, non-ajax) modes over to a similar CGI::Application structure. There were 53 modes that had to be ported, and I did them methodically, first porting all the Poll-related requests, then all the article-releated ones, & etc. I think I did about 15 a day for three days. Then the rest in a sudden rush.

In conclusion the code is now fast because we don't use CGI, and instead use FastCGI.

This allowed minor changes to be carried out, such as compiling the HTML::Template templates which determine the look and feel, etc. Those things don't make sense in the CGI environment, but with persistence they are essentially free.

The site got a little more of a speed boost when I updated DNS, and a lot more when I blacklisted a bunch of IP-space.

As I was wrapping this up I realized that the code had accidentally become closed - because the old repository no longer exists. That is not deliberate, or intentional, and will be rectified soon.

The site would never have been started if I'd not seen Dennys original project, and although I don't think others would use the code it should be possible. I remember at the time I was searching for things like "Perl CMS" and finding Slashcode, and Scoop, which I knew were too heavyweight for my little toy blog.

In conclusion Debian Administration website is 10 years old now. It might not have changed the world, it might have become less relevant, but I'm glad I tried, and I'm glad there were years when it really was the best place to be.

These days there are HowtoForges, blogs, spam posts titled "How to install SSH on Trusty", "How to install SSH on Wheezy", "How to install SSH on Precise", and all that. No shortage of content, just finding the good from the bad is the challenge.

Me? The single best resource I read these days is probably LWN.net.

Starting to ramble now.

Go look at my quick hack for remote command execution https://github.com/skx/nanoexec ?

| No comments

 

Updates on git-hosting and load-balancing

25 August 2014 21:50

To round up the discussion of the Debian Administration site yesterday I flipped the switch on the load-balancing. Rather than this:

  https -> pound \
                  \
  http  -------------> varnish  --> apache

We now have the simpler route for all requests:

http  -> haproxy -> apache
https -> haproxy -> apache

This means we have one less HTTP-request for all incoming secure connections, and these days secure connections are preferred since a Strict-Transport-Security header is set.

In other news I've been juggling git repositories; I've setup an installation of GitBucket on my git-host. My personal git repository used to contain some private repositories and some mirrors.

Now it contains mirrors of most things on github, as well as many more private repositories.

The main reason for the switch was to get a prettier interface and bug-tracker support.

A side-benefit is that I can use "groups" to organize repositories, so for example:

Most of those are mirrors of the github repositories, but some are new. When signed in I see more sources, for example the source to http://steve.org.uk.

I've been pleased with the setup and performance, though I had to add some caching and some other magic at the nginx level to provide /robots.txt, etc, which are not otherwise present.

I'm not abandoning github, but I will no longer be using it for private repositories (I was gifted a free subscription a year or three ago), and nor will I post things there exclusively.

If a single canonical source location is required for a repository it will be one that I control, maintain, and host.

I don't expect I'll give people commit access on this mirror, but it is certainly possible. In the past I've certainly given people access to private repositories for collaboration, etc.

| No comments