25 January 2008 21:50
This week has mostly involved me getting my live mail filtering site up and running with a guineapig or two.
This uses a custom user interface to allow users to manage the filtering settings for an entire domain:
- spam filtering.
- virus scanning.
- sender/recipient whitelisting.
- DNS-based blacklists
In terms of implementation this is an SMTP proxy which is built upon the qpsmtpd framework. I've got both the user interface and a collection of plugins reading all data from an MySQL database.
The practical upshot is that if you use the service you'll get less spam, and anything that has been rejected will appear in an online browsable quarantine for a period of times allowing you to view mistakes/rejected mails.
Any mail you didn't want, providing you've got the spam-filtering plugin enabled for your domain, you may send back to be trained as spam.
It scales nicely, doesn't appear to have lost any mail ever in real-world testing, and could be useful.
Tags: anti-spam, anti-virus, mail-scanning
27 January 2008 21:50
This weekend has been an interesting mix of activities. Mostly I've been tweaking my mail filtering service now that it has more users it is more interesting to do that.
The basic process of mail-scanning is pretty simple, but there are some fun things in the mix which make it slightly more fiddly than I'd like.
The basic recipe goes something like this:
- Accept mail.
- Validate the mail is addressed to a domain hosted upon the machine.
- Do the spam filtering / magic (many steps missing here)
- If the mail should be rejected archive it to a local Maildir folder and bounce it.
- If the mail should be accepted then forward it to the destination machine.
The archiving of all rejected messages is a big win. It means that if there is a mistake in the handling of any mail we could undo it, retraining the spam database etc. It also provides, via a web page/rss feed, a way for a user to see what a good job the filtering system is doing - by saying "Here's what you would have had ..".
Today I switched the way that the archived mail is displayed via the Web GUI. Previously I used some nasty Maildir parsing code, but now I'm running IMAP upon localhost - so the viewing of messages is a lot more straightforward. (via Net::IMAP::Simple.)
More interestingly, to most readers I'm sure, today I managed to take a new Kite out for flying. A cold and windy day, but lots of fun. There was beer, pies, and near-death!
This was also the second weekend I carried out some painting of my front room. At this rate I'll have painted all four walls of the room in less than two months! (The last time I painted a room it took approximately six months to complete. Move furnuture & paint one wall. Wait several weeks, then repeat until all walls are complete!)
Tags: diy, kite, mail-scanning, random
27 February 2008 21:50
For the past couple of days I've been working on some "easy hosting" setup for Debian. This is a continuation of my shell-script based solution, but intended to be much more dynamic.
The system makes it simple for us to deploy a Debian Etch installation, and allow users to create virtualhosts easily. (And by easily I mean by creating directories, and very little else.)
So, for example, to create a new website simple point the IP address of your domain example.org to the IP of your machine. Then run:
mkdir -p /srv/example.com/cgi-bin
mkdir -p /srv/example.com/htdocs
If you then want to allow FTP access to upload you may run:
echo "mysecretpass" > /srv/example.com/ftp-password
This will give you FTP access, username "example.com", password "mysecretpass". You'll be chrooted into the /srv/example.com/ directory.
All of this is trivial. Via Apache's mod_vhost_alias, and a simple script to split logfiles and generate statistics via webalizer for each domain. The only thing that I really needed to do was to come up with a simple shell script & cron entry to build up an FTP password file for pure-ftpd.
So here's where it gets interesting. The next job, obviously, is to handle mail for the domains. Under Debian it should be a matter of creating an appropriate /etc/exim/exim4.conf - and ignoring the rest of the setup.
I'm getting some help with that, because despite knowing almost too much about SMTP these days I'm still a little hazy on Exim4 configuration.
I'm watching the recent debian configuration packages system with interest, because right now I'm not touching any configuration files I'm sure that it is only a matter of time.
In other news I cut prices, and am seeing a lot of interest in my mail-scanning.
Finally my .emacs file has been tweaked a lot over the previous few days. Far too much fun. (support files.)
Tags: apache2, exim4, mail-scanning, pure-ftpd, virtual hosting
1 March 2008 21:50
I've been re-reading RFC 2822 again over the weekend, for obvious reasons, and I'm amused I've not noticed this section in the past:
3.6.5. Informational fields
The informational fields are all optional. The "Keywords:" field
contains a comma-separated list of one or more words or
quoted-strings. The "Subject:" and "Comments:" fields are
unstructured fields as defined in section 2.2.1, and therefore may
contain text or folding white space.
subject = "Subject:" unstructured CRLF
comments = "Comments:" unstructured CRLF
keywords = "Keywords:" phrase *("," phrase) CRLF
Now we all know that emails have subjects, but how many people have ever used the Keywords: header, or the Comments: one?
It'd be nice if we could use these fields in mails - I can immediately think of "keywords" as tags, and I'm sure I'm not alone.
I've looked at multiple "tags for mutt" systems, but all of them fall down for the same reason. I can add tags to a mail, and limit a folder to those mails that contain a given tag. But I cannot do that for multiple folders and that makes them useless :(
Has anybody worked on a multi-folder tag system for Mutt? If so pointers welcome. If not I'd be tempted to create one.
I guess implementation would be very simple. There are three caeses:
- Adding a tag
- Deleting a tag
- Finding all messages with agiven tag.
The first two are easy. The second could be done by writing a cronjob to scan messages for Keyword: headers, and writing a simple index. That could then be used to populate an "~/Maildir/.tag-results" folder, via hardlinks, of all matching messages.
Better yet you could pre-populate ~/Maildir/.tag-$foo containing each message with a given tag. Then theres no searching required! (Although your cronjob would need to run often enough when the tag were added to a message it would appear there within a reasonable timeframe.
Update: I've written the indexer now. It works pretty quickly after the initial run, and is quite neat! tagging messages with mutt.
Tags: mail-scanning, mutt, rfc, tags
1 May 2008 21:50
Tonight I'm going to enjoy a nice long sleep after attending The Beltane Fire Festival yesterday evening.
I did manage to sort out an SSL certificate yesterday, before I went out. A lengthier process than expected because the SSL-registrar was annoying and mailed the admin address listed in whois for my domain; rather than an address upon the domain itself.
I guess they can't be blamed for that, and the registrar did forward on the request when begged, so it wasn't the end of the world. For reference I used godaddy.com; who sold me a 3 year SSL certificate for about £25.
Today I've been mostly catatonic because I had only two hours sleep
last night. But one good piece of news was receiving a (postal) mail
from Runa in response to the
letter I had sent her some time ago.
Tags: mail-scanning, procmail, ssl
7 May 2008 21:50
Well a brief post about what I've been up to over the past few days.
An alioth project was created for the maintainance of the bash-completion package. I spent about 40 minutes yesterday committing fixes to some of the low-lying fruit.
I suspect I'll do a little more of that, and then back off. I only started looking at the package because there was a request-for-help bug filed against it. It works well enough for me with some small local additions
The big decision for the bash-completion project is how to go forwards from the current situation where the project is basically a large monolithic script. Ideally the openssh-client package should contain the completion for ssh, scp, etc..
Making that transition will be hard. But interesting.
In other news I submitted a couple of "make-work" patches to the QPSMTPD SMTP proxy - just tidying up a minor cosmetic issues. I'm starting to get to the point where I understand the internals pretty well now, which is a good thing!
I love working on QPSMTPD. It rocks. It is basically the core of my antispam service and a real delight to code for. I cannot overemphasise that enough - some projects are just so obviously coded properly. Hard to replicate, easy to recognise...
I've been working on my own pre-connection system which is a little more specialied; making use of the Class::Pluggable library - packaged for Debian by Sarah.
(The world -> Pre-Connection/Load-Balancing Proxy -> QPSMTPD -> Exim4. No fragility there then ;)
I still need to sit down and work through the Apache2 bugs I identified as being simple to fix. I've got it building from SVN now though; so progress is being made!
Finally this weekend I need to sit down and find the time to answer Steve's "Team Questionnaire". Leave it any longer and it'll never get answered. Sigh.
ObQuote: Shooting Fish
18 June 2008 21:50
Well I'm back and ready to do some fun work.
In the meantime it seems that at least one of my crash-fixes, from the prior public bugfixing, has been uploaded:
I'm still a little bit frustrated that some of the other patches I made (to different packages) were ignored, but I guess I shouldn't be too worried. They'll get fixed sooner or later regardless of whether it was "my" fix.
In other news I've been stalling a little on the Debian Administration website.
There are a couple of reasonable articles in the submissions queue - but nothing really special. I can't help thinking that the next article being a nice round number of 600 deserves something good/special/unique? hard to quantify, but definitely something I'm thinking. I guess I leave it while the weekend and if nothing presents itself I'll just go dequeue the pending pieces.
In other news I've managed to migrate the mail scanning service into a nicely split architecture - with minimal downtime.
I'm pleased that:
- The architecture was changed massively from a single-machine orientated service to a trivially scalable one - and that this was essentially seamless.
- My test cases really worked.
- I've switched from being "toy" to being "small".
- I've even pulled in a couple of new users.
Probably more changes to come once I've had a rest (but I guess I write about that elsewhere; because otherwise people get bored!).
The most obvious change to consider is to allow almost "instant-activation". I dislike having to manually approve and setup new domains, even if it does boil down to clicking a button on a webpage - so I'm thinking I should have a system in place such that you can sign up, add your domain, and be good to go without manual involvement. (Once DNS has propogated, obviously!)
Anyway enough writing. Ice-cream calls, and then I must see if more bugs have been reported against my packages...
ObQuote: Run Lola Run.
Tags: debian, mail-scanning
14 July 2008 21:50
Yesterday I was forced to test my backup system in anger, on a large scale, for the first time in months.
A broken package upgrade meant that my anti-spam system lost the contents of all its MySQL databases.
That was a little traumatic, to say the least. But happily I have a good scheme of backups in place, and only a single MX machine was affected.
So, whilst there was approximately an hour of downtime on the primary MX the service as a whole continued to run, and the secondary (+ trial tertiary) MX machines managed to handle the load between them.
I'm almost pleased I had to suffer this downtime, because it did convince me that my split-architecture is stable - and that the loss of the primary MX machine isn't a catastrophic failure.
The main reason for panicing was that I was late for a night in the pub. Thankfully the people I were due to meet believe in flexible approaches to start times - something I personally don't really believe in.
Anyway the mail service is running well, and I've setup "instant activation now", combined with a full month of free service which is helping attract more users.
Apart from that I've continued my plan of migrating away from Xen, and toward KVM. That is going well.
I've got a few guests up and running, and I'm impressed at how stable, fast, and simple the whole process is. :)
ObQuote: Brief Encounter
(That is a great film; and a true classic. Recommended.)
Tags: kvm, mail-scanning, xen
24 July 2008 21:50
Chronicle Theme Update
Gunnar Wolf made an interesting post about KVM today which is timely.
He points to a simple shell script for managing running instances of KVM which was a big improvement on mine - and so is worth a look if you're doing that stuff yourself.
Once I find time I will document my reasons for changing from Xen to KVM, but barring a few irritations I'm liking it a lot.
I made a new release of the chronicle blog compiler yesterday, mostly to update one of the themes.
That was for purely selfish reasons as I've taken the time to update the antispam protection site I'm maintaining. There have been some nice changes to make it scale more and now it is time for me to make it look prettier.
(A common theme - I'm very bad at doing website design.)
So now the site blog matches the real site.
ObQuote: Resident Evil
Tags: chronicle, kvm, mail-scanning, xen
5 August 2008 21:50
It is nice when you work for a company where you can say:
Tonight has been a productive evening, I guess the ice-lolly helped!
I managed to optimize the storage of rejected SPAM mail for my commercial service. That is something I've been obsessing over recently since the volume of SPAM is currently hovering around 2.5 million messages.
Still I suspect it is only a matter of weeks before I need to expand. The current setup has me using three machines:
- Primary machine runs:
- Web Application
- SMTP processing/filtering/delivery
- Secondary machine runs:
- SMTP processing/filtering/delivery
- Offsite machine:
Ideally I'd like to split that up further so that I have a single machine running the web application (the part the user interacts with), a pair of MX machines, and the offsite machine doing the minimal work it does.
That way the incoming mail will not affect the application at all directly.
Thankfully the split should be trivial. The only hard part is finding a fast webhost that can offer me ~1Gb of RAM, ~1000Gb of disk space, and won't charge much. Ideally around £15/$30 a month. (hahaha! hahaha! ha!)
Tags: hosting, mail-scanning, random, work
2 September 2008 21:50
Yesterday I made a new release of the chronicle blog compiler. This fixes a bug in the handling of comments.
Previously comments were sorted badly, when they crossed a month boundary. Now they are always sorted first to last - which makes reading entries with multiple comments more natural.
Other than that I've been readying for the launch of a new MX machine for my mail filtering service. The process went pretty smoothly, and so I'm happy.
Still have that paranoid feeling that something will break, but at the very least I'll hear about it quickly thanks to the SMS-alerts!
ObMovie: Brief Encountery
Tags: chronicle, mail-scanning
21 September 2008 21:50
Every now and again the topic of SELinux arises locally.
I still believe it is:
- Theoretically interesting.
- Not ready for the prime time.
- Not something I ever consider using.
I kept quiet when the Should SELinux be standard topic was recently raised. But I personally believe the answer should be emphatically "No".
Anyway, change of subject. The recent "What do you look like right now" meme. I looked like this a couple of days ago. Today I have no hair.
In other news my mail scanning service has now reached a new record. Over the last 30 days it has rejected/archived ovr three million SPAM messages.
Three million messages over a month averages out at about 100,000 messages a day. Sustained. Nice.
Finally I really owe Runa a new letter. I will write it today.
ObQuote: Bill & Ted's Excellent Adventure
Tags: images, letters, mail-scanning, meme, pictures, selinux, stats
20 January 2009 21:50
Fabio Tranchitella recently posted about his new filesystem which really reminded me of an outstanding problem I have.
I do some email filtering, and that is setup in a nice distributed fashion. I have a web/db machine, and then I have a number of MX machines which process incoming mail rejecting spam and queuing good mail for delivery.
I try not to talk about it very often, because that just smells of marketting. More users would be good, but I find explicit promotion & advertising distasteful. (It helps to genuinly consider users as users, and not customers even though money changes hands.)
Anyway I handle mail for just over 150 domains (some domains will receive 40,000 emails a day others will receive 10 emails a week) and each of these domains has different settings, such as "is virus scanning enabled?" and "which are the valid localparts at this domain?", then there are whitelists, blacklists, all that good stuff.
The user is encouraged to fiddle with their settings via the web/db/master machine - but ultimately any settings actually applied and used upon the MX boxes. This was initially achieved by having MySQL database slaves, but eventually I settled upon a simpler and more robust scheme: Using the filesystem. (Many reasons why, but perhaps the simplest justification is that this way things continue to work even if the master machine goes offline, or there are network routing issues. Each MX machine is essentially standalone and doesn't need to be always talking to the master host. This is good.)
On the master each domain has settings beneath /srv. Changes are applied to the files there, and to make the settings live on the slave MX boxes I can merely rsync the contents over.
Here's an anonymized example of a settings hierarchy:
| `-- enabled
| |-- action
| `-- zones
| |-- foo.example.com
| `-- bar.spam-house.com
| `-- english-only
| `-- admin_._admin
| |-- action
| |-- enabled
| `-- text
| |-- anonymous
| `-- bobby
| |-- action
| |-- enabled
| `-- text
| |-- bob
| |-- root
| |-- simon
| |-- smith
| |-- steve
| `-- wildcard
| |-- action
| |-- enabled
| `-- text
| `-- [blah]
| `-- simon
So a user makes a change on the web machine. That updates /srv on the master machine immediately - and then every fifteen minutes, or so, the settigngs are pushed accross to the MX boxes where the incoming mail is actually processed.
Now ideally I want the updates to be applied immediately. That means I should look at using sshfs or similar. But also as a matter of policy I want to keep things reliable. If the main box dies I don't want the machines to suddenly cease working. So that rules out remotely mounting via sshfs, nfs or similar.
Thus far I've not really looked at the possabilities, but I'm leaning towards having each MX machine look for settings in two places:
- Look for "live" copies in /srv/
- If that isn't available then fall back to reading settings from /backup/
That way I can rsync to /backup on a fixed schedule, but expect that in everyday operation I'll get current/live settings from /srv via NFS, sshfs, or something similar.
My job for the weekend is to look around and see what filesystems are available and look at testing them.
Tags: filesystems, mail-scanning, random
6 October 2009 21:50
Recently I posted a brief tool for managing "dotfile collections". This tool was the rationalisation of a couple of adhoc scripts I already used, and was a quick hack written in nasty bash.
I've updated my tool so that it is coded in slightly less nasty Perl. You can find the dotfile-manager repository online now.
This tool works well with my dotfile repository, and the matching, but non-public dotfiles-private repository.
I'm suspect that this post might flood a couple of feed agregators, because I've recently my chronicle blog compiler with a new release. This release has updated all the supplied themes/templates such that they validate strictly, and as part of that I had to edit some of my prior blog entries to remove bogus HTML markup. (Usually simple things suck as failing to escape & characters correctly, or using "[p][/P]" due to sloppy shift-driving.)
I should probably update the way I post entries, and use markdown or textile instead of manually writing HTML inside Emacs, but the habit has been here for too long. Even back when I used wordpress I wrote my entries in HTML...
Finally one other change in the most recent chronicle release is that the "mail-scanning.com theme" has been removed, as the service itself is no longer available. But all is not lost.
ObFilm: Blade II
Tags: chronicle, dotfile-manager, dotfiles, mail-scanning
3 December 2017 21:50
I've shuffled around all the repositories which are associated with the blogspam service, such that they're all in the same place and refer to each other correctly:
Otherwise I've done a bit of tidying up on virtual machines, and I'm just about to drop the use of
qpsmtpd for handling my email. I've used the (perl-based) qpsmtpd project for many years, and documented how my system works in a "book":
I'll be switching to pure
exim4-based setup later today, and we'll see what that does. So far today I've received over five thousand spam emails:
steve@ssh /spam/today $ find . -type f | wc -l
Looking more closely though over half of these rejections are "dictionary attacks", so they're not SPAM I'd see if I dropped the
qpsmtpd-layer. Here's a sample log entry (for a mail that was both rejected at SMTP-time by
qpsmtpd and archived to disc in case of error):
"reason":"Mail for juha not accepted at steve.fi",
"subject":"Viagra Professional. Beyond compare. Buy at our shop.",
I suspect that with procmail piping to
crm114, and a beefed up spam-checking configuration for exim4 I'll not see a significant difference and I'll have removed something non-standard. For what it is worth over 75% of the remaining junk which was rejected at SMTP-time has been rejected via DNS-blacklists. So again exim4 will take care of that for me.
If it turns out that I'm getting inundated with junk-mail I'll revert this, but I suspect that it'll all be fine.
Tags: blogspam, exim4, github, mail-scanning, perl, qpsmtpd