Recently the Debian Administration website has been besieged by spammers submitting bogus comments.
I also run a couple of blogs which are constantly having random gibberish submitted to them - although it is less of a problem there because I approve them manually due to the offline nature of my blog, and the blog compiler I use.
Regardless I figured it was time to do something about it, in part because Chris Searle suggested that I should be able to query arbitrary text for spammyness due to my experience with fighting email spam. (As it turns out that is not so useful experience. Comments and emails are not the same. The same header-trickery you see in bogus email, false EHLOs, etc, are not useful concepts to carry accross.)
Anyway we were saying, I was going to do something? So I did. My solution? An XML-RPC server which you submit your comments to. This will return either:
|The comment is clean.
|The comment is spam, and the optional reason explains further
The server itself is very simple, and it comes with a collection of simple plugins which do the job of testing different things. More plugins will almost certainly be added over time - so far I've seen the lotsaurls plugin capturing most of the SPAM, although the DNSRBL lookup is also working nicely having captured live spam without my touching it. Huzzah. & etc..
If there is any interest feel free to let me know, although as the server is open, and the client side usage is documented I imagine other people could run with it easily enough too - avoiding me becoming a single point of failure.
For reference my initial change to hook this up live on the site was only a few lines of code - although later I did need to make some additional changes to make it cope with failures, & etc.