So I previously talked about the setup behind Debian Administration, and my complaints about the slownes.
The previous post talked about the logical setup, and the hardware. This post talks about the more interesting thing. The code.
The code behind the site was originally written by Denny De La Haye. I found it and reworked it a lot, most obviously adding structure and test cases.
Once I did that the early version of the site was born.
Later my version became the official version, as when Denny setup Police State UK he used my codebase rather than his.
So the code huh? Well as you might expect it is written in Perl. There used to be this layout:
yawns/cgi-bin/index.cgi
yawns/cgi-bin/Pages.pl
yawns/lib/...
yawns/htdocs/
Almost every request would hit the index.cgi script, which would parse the request and return the appropriate output via the standard CGI interface.
How did it know what you wanted? Well sometimes there would be a paramater set which would be looked up in a dispatch-table:
/cgi-bin/index.cgi?article=40 - Show article 40
/cgi-bin/index.cgi?view_user=Steve - Show the user Steve
/cgi-bin/index.cgi?recent_comments=10 - Show the most recent comments.
Over time the code became hard to update because there was no consistency, and over time the site became slow because this is not a quick setup. Spiders, bots, and just average users would cause a lot of perl processes to run.
So? What did I do? I moved the thing to using FastCGI, which avoids the cost of forking Perl and loading (100k+) the code.
Unfortunately this required a bit of work because all the parameter handling was messy and caused issues if I just renamed index.cgi -> index.fcgi. The most obvious solution was to use one parameter, globally, to specify the requested mode of operation.
Hang on? One parameter to control the page requested? A persistant environment? What does that remind me of? Yes. CGI::Application.
I started small, and pulled some of the code out of index.cgi + Pages.pl, and over into a dedicated CGI::Application class:
- Application::Feeds - Called via /cgi-bin/f.fcgi.
- Application::Ajax - Called via /cgi-bin/a.fcgi.
So now every part of the site that is called by Ajax has one persistent handler, and every part of the site which returns RSS feeds has another.
I had some fun setting up the sessions to match those created by the old stuff, but I quickly made it work, as this example shows:
The final job was the biggest, moving all the other (non-feed, non-ajax) modes over to a similar CGI::Application structure. There were 53 modes that had to be ported, and I did them methodically, first porting all the Poll-related requests, then all the article-releated ones, & etc. I think I did about 15 a day for three days. Then the rest in a sudden rush.
In conclusion the code is now fast because we don't use CGI, and instead use FastCGI.
This allowed minor changes to be carried out, such as compiling the HTML::Template templates which determine the look and feel, etc. Those things don't make sense in the CGI environment, but with persistence they are essentially free.
The site got a little more of a speed boost when I updated DNS, and a lot more when I blacklisted a bunch of IP-space.
As I was wrapping this up I realized that the code had accidentally become closed - because the old repository no longer exists. That is not deliberate, or intentional, and will be rectified soon.
The site would never have been started if I'd not seen Dennys original project, and although I don't think others would use the code it should be possible. I remember at the time I was searching for things like "Perl CMS" and finding Slashcode, and Scoop, which I knew were too heavyweight for my little toy blog.
In conclusion Debian Administration website is 10 years old now. It might not have changed the world, it might have become less relevant, but I'm glad I tried, and I'm glad there were years when it really was the best place to be.
These days there are HowtoForges, blogs, spam posts titled "How to install SSH on Trusty", "How to install SSH on Wheezy", "How to install SSH on Precise", and all that. No shortage of content, just finding the good from the bad is the challenge.
Me? The single best resource I read these days is probably LWN.net.
Starting to ramble now.
Go look at my quick hack for remote command execution https://github.com/skx/nanoexec ?
Tags: debian-administration, yawns
|