Entries posted in June 2008

The problem with the English language is all those pesky words

Sunday, 29 June 2008

There exist many bayasian/statistical spam filters, ranging from products such as spambayes, and spamassassin, to crm114. Each of them works in their own way. Having used and tested almost all of them I've noticed a common flaw.

The vast majority of spam-filters struggle to correctly classify "419 scam" mails, lottery fraud, and similar mails.

Why is that? In general, having read hundreds of these mails, I can see several things that are common in these kind of the mails:

  • Mention of currency in both numeric and word forms. ($1,000,000 + 1 million US dollars)
  • Mention of a country / nationality (Sierra Lione, Nigerian)
  • Mention of a reference/claim number and often "official address".
  • Christian references.
  • Greetings such as "dear friend", and mentions of discretion/secrecy.
  • Size. (A scam mail is typically greater in length than an average spam mail).

Whilst none of these individually are indicative of a scam mail it is interesting to count their combined occurance.

I've written a toy program to count these things, and so far the success rate is >60% which is a reasonable start - providing this kind of detection occurs after normal filtering.

I may experiment further, but I figured a public query on scam detection might be appropriate.

Whilst the detecting a scam mail is a subset of detecting a spam email there are probably simplifications that may be made, and exploring those wouldn't be a bad thing.

ObQuote: Buffy.



Sorry I'm late. Work was murder.

Tuesday, 24 June 2008

I've spent a few hours recently looking at building RPM packages of GNU/Linux kernels, which has been a frustrating process.

There are many many online guides which give the impression that this is actually a pretty complex process. For example How To Compile A Kernel - The CentOS Way guide. (Did I mention how bad most of the howtoforge guides are recently?)

So, after fiddling around for an afternoon and getting lost I decided to abandon the process.

Here is a tested process for building a binary RPM kernel package:

cd linux-
make rpm

Yes this works just fine upon a Centos 5.x machine - I'm used to using make-kpkg to make a Debian kernel package, but it seems that if you just visit kernel.org and download the latest version you can build a RPM without any extra effort thanks to native support. Cool.

Now I need to work out how to create, host, and update a YUM repository. That looks fiddly and annoying too. XML. Eww. Any guides are most welcome - ultimately I need to package and host a "recent" kernel for Centos 4.x, Centos 5.x and Fedora Core 6-9 - each for i386 + amd64.

ObQuote: Spiderman



There is something evil there

Monday, 23 June 2008

So I've had a hectic few days, and I'm getting close to having caught up with the things that I've been sitting on whilst I've been away.

ObRandom: Several people, independantly, have told me within the past few days that "whilst" is not a real word. it is. End of ..

Some interesting things I've been working upon recently include a fun little firewall tool. Once upon a time I wrote a firewall script which worked like this:

`-- incoming.d
    |-- smtp
    |-- ssh
    `-- www
`-- outgoing.d
    |-- ssh
    |-- smtp
    |-- dns
    `-- icmp

When you executed the magic firewall script it would scan the incoming.d directory, and for each file it found lookup the relevant port in /etc/services. These port numbers would then be opened. And at the end you'd just have a "-j DROP".

After a long phone conversation to a colleague on Thursday/Friday of last week I've now reworked this idea anew. There is still the notion of filenames referring to what is allowed for a pair of directories (incoming.d/ + outgoing.d/) but even more flexability and no hardwired use of /etc/servvices.

I guess some ideas are just too simple to give up ..?

Anyway there are a plethora of different firewall applications of varying sophistication and complexity in the world. I don't really want to go out of my way to promote this one - but at the same time it might be a useful idea for somebody?

The next (work) job I have is determining how to make a "kernel" + "kernel-dev" RPM package based on Debian sources. Joy. Actually the more I look around the more fiddly, annoying, and troublesome I suspect this is going to be. Sigh.

ObQuote: The Grudgy



Manni - you're not dead yet.

Wednesday, 18 June 2008

Well I'm back and ready to do some fun work.

In the meantime it seems that at least one of my crash-fixes, from the prior public bugfixing, has been uploaded:

I'm still a little bit frustrated that some of the other patches I made (to different packages) were ignored, but I guess I shouldn't be too worried. They'll get fixed sooner or later regardless of whether it was "my" fix.

In other news I've been stalling a little on the Debian Administration website.

There are a couple of reasonable articles in the submissions queue - but nothing really special. I can't help thinking that the next article being a nice round number of 600 deserves something good/special/unique? hard to quantify, but definitely something I'm thinking. I guess I leave it while the weekend and if nothing presents itself I'll just go dequeue the pending pieces.

In other news I've managed to migrate the mail scanning service into a nicely split architecture - with minimal downtime.

I'm pleased that:

  • The architecture was changed massively from a single-machine orientated service to a trivially scalable one - and that this was essentially seamless.
  • My test cases really worked.
  • I've switched from being "toy" to being "small".
  • I've even pulled in a couple of new users.

Probably more changes to come once I've had a rest (but I guess I write about that elsewhere; because otherwise people get bored!).

The most obvious change to consider is to allow almost "instant-activation". I dislike having to manually approve and setup new domains, even if it does boil down to clicking a button on a webpage - so I'm thinking I should have a system in place such that you can sign up, add your domain, and be good to go without manual involvement. (Once DNS has propogated, obviously!)

Anyway enough writing. Ice-cream calls, and then I must see if more bugs have been reported against my packages...

ObQuote: Run Lola Run.

| No comments


Recent Posts

Recent Tags