About Archive Tags RSS Feed

 

Is my personal life of interest to you?

15 March 2009 21:50

This weekend I mostly fiddled around migrating machines from Xen hosting to KVM hosting. Ultimately it was largely a waste of time, due to various other factors. Still with a bit of luck it will be possible to move the machiens next week.

That aside I spent a while updating my blogspam detection site. As a brief recap this site offers a simple XML-RPC service which allows you to test whether incoming blog comments are spam or not.

Originally this was put together to fight an invasion of comments submited to the Debian Administration website: The site currently shows:

SiteSpamNon-Spam% spam
debian-administration.org 238 372 60.98% spam

Depressing. But not as depressing as the real live stats which show since I last reset the counters 36,995 spam comments vs. 1,206 non-spam comments. (live updating counters here)

Anyway I updated the service today to add two new plugins, both of which are a little reactionary.

The first new plugin is called "multilink" and is based upon the observation that spammers rarely know the markup of the site they are submitting comments to. This means you can frequently see submitted comments like this:

 <a href="http://spam.com">buy viagra</a>
 [url=http://spam.com]buy viagra[/url]
 [link=http://spam.com]buy me[/link]

Here we have three different styles of links - "a href", "link=", and "url=". I figure this is a clear indicator of a confused mind, or more likely a spammer.

The second new plugin is designed to stop people who enter "<strong>" words. It is a little coarse but actuall zero false positives in the real world so I'm going to leave it live to see how it works out.

In happier news I'm just back from a trip to the beach. Sand rocks. Even if it wasn't windy enough for my kite ..

ObFilm: Dracula ("Bram Stoker's Dracula" - 1992)

| 2 comments

 

Comments on this entry

icon Ken Klaser at 22:38 on 15 March 2009

Hi Steve,
Thanks for all you do!

Spam Karma was a fabulous anti-spam tool for early WordPress blogs, and it worked in a basic mode without needing to access any third-party servers. It gave users a nice set of controls (that some felt were too complicated, but others loved). Unfortunately, the developer Dave gave up on it due to its non-paying nature, and it probably doesn't work on the latest WordPress blog software. Dave said he put it under GPL V.2 in a Google code repository there at the end some months ago. If you don't know anything about this, feel free to email me and I'll send you links to the critical information (I'd post the URLs here, but don't want this message to be spammy), though you may be able to find it yourself.

I thought the code itself might give you some ideas for implementing in your own perl project. Maybe you've already seen it.

The spam problem is absolutely horrendous for users who aren't programmers and who may want their own weblog.

icon Steve Kemp at 22:45 on 15 March 2009

Thanks!

I've taken a brief look at spam karma and it has some cute ideas. Mostly it seems to:

  • Check against dnsrbl
  • Do javascript fu.
  • Require challenges in forms
  • Look for entities and link counts.

There's some more tests but I think a lot of them aren't appropriate for me - because they require hooks into the comment generating form, and thats somethign I don't want to deal with.

Still definitely worth examining, and I'll take a more thorough look shortly.