When volume becomes high enough you start to observe patterns in SPAM pretty easily. I think that this is primarily because people like to see patterns, whether they are present or not.
The trick is determining whether they are real patterns or not, and then to a lesser extent whether they are useful patterns.
For example I host mail for a business domain. That means that incoming messages come primarily from existing customers, and very rarely from potential new ones.
In practise that means that email is expected to arrive from 9am til 6pm (+/-2hours) Email received at 2AM? Either it is somebody working remotely, a foreign contact, or much more likely it is SPAM.
Now clearly you cannot dump all messages received at unusual times of the day, but it is a surprisingly robust SPAM indicator for that particular domain.
All heuristics are fallable, but some are useful regardless..
I'd love to know what people can learn from their SPAM. This week I'm handling approximately 80,000 messages a day, per MX, which isn't huge (ie. 2-3 million a month).
ObQuote: Highlander
Tags: spam 5 comments
Long before I had my blog, I used my LiveJournal to write occasional blurbs about technology.
At one point, I lost almost all of my spam. That was freaky. Of course, it came back later. Sorry the graph isn't there on that post anymore, but that host is long gone.