Once upon a time I wrote a blog compiler, a simple tool that would read in a bunch of text files and output a blog. This blog would contain little hierarchies for tags, historical archives, etc. It would also have a number of RSS feeds too.
Every now and again somebody will compare it to ikiwiki and I'll ignore that comparison entirely, because the two tools do different things in completely different fashions.
But I was interested to see Joey talk about performance tweaks recently as I have a blog which has about 900 pages, and which takes just over 2 minutes to build from start to finish. (Not this one!)
I've been pondering performance for a while as I know my current approach is not suited to high speed. Currently the compiler reads in every entry and builds a giant data structure in memory which is walked in different fashions to generate and output pages.
The speed issue comes about because storing the data structure entirely in memory is insane, and because sometimes a single entry will be read from disk multiple times.
I've made some changes over the past few evenings such that a single blog entry will be read no more than once from disk (and perhaps zero times if Memcached is in use :) but that doesn't solve the problem of the memory usage.
So last night I made a quick hack - using my introduction to SQLite as inspiration I wrote a minimal reimplementation of chronicle which does things differently:
- Creates a temporary SQLite database with tables: posts, tags, comments.
- Reads every blog entry and inserts it into the database.
- Uses the database to output pages.
- Deletes the database.
This is a significantly faster approach than the previous one - with a "make steve" job taking only 18 seconds, down from just over 2 minutes 5 seconds.
("make steve" uses rsync to pull in comments on entries, rebuilds the blog, then uses rsync to push the generated output into its live location.)