About Archive Tags RSS Feed

 

There's no such thing as a wrong war

13 October 2009 21:50

Once upon a time I wrote a blog compiler, a simple tool that would read in a bunch of text files and output a blog. This blog would contain little hierarchies for tags, historical archives, etc. It would also have a number of RSS feeds too.

Every now and again somebody will compare it to ikiwiki and I'll ignore that comparison entirely, because the two tools do different things in completely different fashions.

But I was interested to see Joey talk about performance tweaks recently as I have a blog which has about 900 pages, and which takes just over 2 minutes to build from start to finish. (Not this one!)

I've been pondering performance for a while as I know my current approach is not suited to high speed. Currently the compiler reads in every entry and builds a giant data structure in memory which is walked in different fashions to generate and output pages.

The speed issue comes about because storing the data structure entirely in memory is insane, and because sometimes a single entry will be read from disk multiple times.

I've made some changes over the past few evenings such that a single blog entry will be read no more than once from disk (and perhaps zero times if Memcached is in use :) but that doesn't solve the problem of the memory usage.

So last night I made a quick hack - using my introduction to SQLite as inspiration I wrote a minimal reimplementation of chronicle which does things differently:

  • Creates a temporary SQLite database with tables: posts, tags, comments.
  • Reads every blog entry and inserts it into the database.
  • Uses the database to output pages.
  • Deletes the database.

This is a significantly faster approach than the previous one - with a "make steve" job taking only 18 seconds, down from just over 2 minutes 5 seconds.

("make steve" uses rsync to pull in comments on entries, rebuilds the blog, then uses rsync to push the generated output into its live location.)

ObFilm: If...

| 6 comments

 

Comments on this entry

icon Charles Darke at 20:17 on 13 October 2009
http://digitalconsumption.com

but if you're going down that route, why not store everything in the database instead of the flat files?

icon Steve Kemp at 20:22 on 13 October 2009

While storing details about posts, comments, tags, etc in a database for the purpose of generation is fine I love the fact that everything is flat-file based.

Too often in the past I've seen dynamic blogs cause all kinds of problems - be they PHP, Ruby, or Perl-based. I simply do not believe that a blog should be resource intensive and require the use of a dynamic back-end.

I think I am in a minority with this view, but perhaps not a small one given how many people seem to like the project and similar ones.

icon tek at 20:55 on 13 October 2009

I've always written my own blog software (from the days of PERL/CSV to PHP/PERL & MySql) and personally I prefer the dynamic approach, but of course it means having more than one single point of failure. To this end I publish out pages that are flat; but they do not get used unless the database is unavailable for some reason. I've always found this to be the most flexible way IMHO.

18 seconds is impressive though :-)

icon Jon at 12:08 on 14 October 2009

Re: ikiwiki comparisons. From the outside, it's hard to see why they are that different and why comparisons are irrelevant. Are you sure you don't suffer from a mild dose of NIH syndrome?

icon Steve Kemp at 13:30 on 14 October 2009

I think its plain enough:


ikiwiki

Generalised wiki compiler.

Allows inter-connected pages of arbitrary kinds, and uses a nice flexible system such that changes and history are both pulled from a revision control system.

It also allows online editing, and different markups to be used with plugins to select different pages to handle specially.


chronicle

Converts *.txt into a blog, which is a losely connected set of output HTML XML pages.



Specifically chronicle has no reliance on revision control, no special page filters, no notion of history, no notion of online editing and is far less capable.

Or to put it another far more specific in what it does, and wants to do.

Yes you can run ikiwiki as a blog, but if you do that .. well you're missing out on a bunch of stuff - similarly to create a website with chronicle would be an excercise in frustration.

icon Saint Aardvark at 16:04 on 14 October 2009
http://saintaardvarkthecarpeted.com

I'm with you, Steve -- database speed is nice, but flat files are forever.

As for speed gains, what about using a make-like approach? Back when I was writing my own (not nearly so nice) blog compiler, I used make to ensure it wasn't updating pages if they didn't need to be. Why rebuild a page if nothing has changed?