Fabio Tranchitella recently posted about his new filesystem which really reminded me of an outstanding problem I have.
I do some email filtering, and that is setup in a nice distributed fashion. I have a web/db machine, and then I have a number of MX machines which process incoming mail rejecting spam and queuing good mail for delivery.
I try not to talk about it very often, because that just smells of marketting. More users would be good, but I find explicit promotion & advertising distasteful. (It helps to genuinly consider users as users, and not customers even though money changes hands.)
Anyway I handle mail for just over 150 domains (some domains will receive 40,000 emails a day others will receive 10 emails a week) and each of these domains has different settings, such as "is virus scanning enabled?" and "which are the valid localparts at this domain?", then there are whitelists, blacklists, all that good stuff.
The user is encouraged to fiddle with their settings via the web/db/master machine - but ultimately any settings actually applied and used upon the MX boxes. This was initially achieved by having MySQL database slaves, but eventually I settled upon a simpler and more robust scheme: Using the filesystem. (Many reasons why, but perhaps the simplest justification is that this way things continue to work even if the master machine goes offline, or there are network routing issues. Each MX machine is essentially standalone and doesn't need to be always talking to the master host. This is good.)
On the master each domain has settings beneath /srv. Changes are applied to the files there, and to make the settings live on the slave MX boxes I can merely rsync the contents over.
Here's an anonymized example of a settings hierarchy:
/srv/foo.com/
|-- basics
| `-- enabled
|-- dnsbl
| |-- action
| `-- zones
| |-- foo.example.com
| `-- bar.spam-house.com
|-- language
| `-- english-only
|-- mx
|-- quarantine
| `-- admin_._admin
|-- spam
| |-- action
| |-- enabled
| `-- text
|-- spamtraps
| |-- anonymous
| `-- bobby
|-- uribl
| |-- action
| |-- enabled
| `-- text
|-- users
| |-- bob
| |-- root
| |-- simon
| |-- smith
| |-- steve
| `-- wildcard
|-- virus
| |-- action
| |-- enabled
| `-- text
`-- whitelisted
|-- enabled
|-- hosts
|-- subjects
| `-- [blah]
|-- recipients
| `-- simon
`-- senders
|-- [email protected]
|-- @someisp.com
`-- [email protected]
So a user makes a change on the web machine. That updates /srv on the master machine immediately - and then every fifteen minutes, or so, the settigngs are pushed accross to the MX boxes where the incoming mail is actually processed.
Now ideally I want the updates to be applied immediately. That means I should look at using sshfs or similar. But also as a matter of policy I want to keep things reliable. If the main box dies I don't want the machines to suddenly cease working. So that rules out remotely mounting via sshfs, nfs or similar.
Thus far I've not really looked at the possabilities, but I'm leaning towards having each MX machine look for settings in two places:
- Look for "live" copies in /srv/
- If that isn't available then fall back to reading settings from /backup/
That way I can rsync to /backup on a fixed schedule, but expect that in everyday operation I'll get current/live settings from /srv via NFS, sshfs, or something similar.
My job for the weekend is to look around and see what filesystems are available and look at testing them.
Obmovie:Alive
Tags: filesystems, mail-scanning, random
|