About Archive Tags RSS Feed

 

Five grand a head

5 August 2008 21:50

It is nice when you work for a company where you can say:

"Ice-lolly break..."

The response?

"Me too!"

Tonight has been a productive evening, I guess the ice-lolly helped!

I managed to optimize the storage of rejected SPAM mail for my commercial service. That is something I've been obsessing over recently since the volume of SPAM is currently hovering around 2.5 million messages.

Still I suspect it is only a matter of weeks before I need to expand. The current setup has me using three machines:

  • Primary machine runs:
    • Web Application
    • SMTP processing/filtering/delivery
  • Secondary machine runs:
    • SMTP processing/filtering/delivery
  • Offsite machine:

Ideally I'd like to split that up further so that I have a single machine running the web application (the part the user interacts with), a pair of MX machines, and the offsite machine doing the minimal work it does.

That way the incoming mail will not affect the application at all directly.

Thankfully the split should be trivial. The only hard part is finding a fast webhost that can offer me ~1Gb of RAM, ~1000Gb of disk space, and won't charge much. Ideally around £15/$30 a month. (hahaha! hahaha! ha!)

ObQuote: Léon

| 4 comments

 

Comments on this entry

icon Tiago Faria at 20:40 on 5 August 2008
I would LOVE to see the code powering that service. :P
icon toupeira at 20:58 on 5 August 2008
Have you ever considered rejecting spam during the SMTP session, so you don't have to actually store it on your servers? We implemented such a system using Exim4 at our company about a year ago, and we never had any complaints. The sender receives a normal mailer error if a mail was detected as spam, which most people probably don't understand but at least they know that their mail wasn't delivered. Of course you could also only reject high-scoring spam, and still store the dubious cases on the server.
icon Steve Kemp at 20:59 on 5 August 2008

Currently the code behind the service is closed, but that may well change in the future. (I'd love to release it; but only if I could be sure that copy-cats wouldn't "steal" my users, and prevent me from getting more!)

The core of the service is a collection of perl modules which manage the creating, manipulation, and deletion of "domains", "users", and per-domain settings such as "is the virus scanner enabled for this domain?".

So, that's the core - a collection of objects which maintain state about a domain, and the settings the user has chosen to enable (such as blacklists, whitelists, bayasian spam filtering, virus scanning, DNS blacklists, etc).


The objects are manipulated via the web-based control panel, (and also by email), and are consulted in a read-only fashion by the mail handler itself.

The SMTP server is the qpsmtpd SMTP proxy. This is a beautifully flexible SMTP-proxy server written in perl. This server is so minimal that almost everything is written as a plugin - and thats where my helper objects come in.

I've written about 30 different qpsmtpd plugins each of which reads its setting from the objects mentioned above - and react accordingly.

So:

  1. I have a database of settings for a domain.
  2. These database settings are encapsulated in MF::Domain, MF::Domain::SpamFiltering, MF::Domain::Users, etc.
  3. The web/email systems allow these objects to be modified.
  4. The SMTP-server uses these objects to decide what to do at every step of the SMTP transaction.
  5. Ultimately depending upon the settings for the target domain a mail is either rejected ( a copy archived) or delivered.
  6. Currently I've only published one of my qpsmtpd plugins, but more may follow once I've decoupled them from my site. (As so many of the plugins essentially start by looking for the recipient of an email, and finding the perl-domain settings in the database many of them are very tightly coupled to my setup.)

    In terms of the technology I'm using Apache 2.x, CGI::Application, & Perl for the control panel alongside qpstmpd & exim4 for the SMTP handling. There is also some of Danga's memcached thrown in to speed things up.

    60% of the complexity of the service is ensuring that all the mail comes into one central quarantine area where the web application may view it. When you're archiving several Gb of email that process is .. fun. But without the online browsable, searchable, quarantine I think my service would be less fun..

    I'd be happy to give more details privately if you're curious - but I hope that gives a roughly useful overview - and questions are always welcome.

    Did I mention 30 days of free service for new customers? ;)


icon Steve Kemp at 21:09 on 5 August 2008

toupeira: Yes I have considered not archiving the rejected messages, but I was always keen on keeping copies.

From my point of view having a searchable, viewable, archive of all rejected messages has multiple purposes:

Catching Errors

No system is perfect. I think my own is pretty good, but I accept that there are times when it is less good than it could be.

Because there is an archive of rejected messages the recipient has the option of looking for messages which they haven't seen because of an error without having to wait for the sender to notice the bounce, and contact the recipient out of bounds.

Service Differentiatation

Many similar services have only the option of forwarding messagse which are spam to a single email address, or hiding them somewhere you can't view them.

My service is nice and open. Almost every message that is rejected may be viewed, and redelivered with a couple of mouse-clicks.

Showing off

Because there is an archive of every rejected message you can immediately see how well the service is working.

Similarly a user can see that not much mail is being rejected, and could choose to save their money. I want to have satisfied customers, and when people see the quarantine area filling up they are largely impressed, pleased, and surprised.

Removing the quarantine means that any errors will only be noticed if the sender re-mails the recipient, and any figures of rejected/accepted mails aren't open to inspection and questioning.

Despite the pain I love having the ability to seeing which mail has been rejected for my domain(s). True I haven't the time or the patience to go through it all to look for falsly caught mail but, and here is the the important thing, I could if I wanted to.