About Archive Tags RSS Feed

 

There can be only one

31 August 2008 21:50

When volume becomes high enough you start to observe patterns in SPAM pretty easily. I think that this is primarily because people like to see patterns, whether they are present or not.

The trick is determining whether they are real patterns or not, and then to a lesser extent whether they are useful patterns.

For example I host mail for a business domain. That means that incoming messages come primarily from existing customers, and very rarely from potential new ones.

In practise that means that email is expected to arrive from 9am til 6pm (+/-2hours) Email received at 2AM? Either it is somebody working remotely, a foreign contact, or much more likely it is SPAM.

Now clearly you cannot dump all messages received at unusual times of the day, but it is a surprisingly robust SPAM indicator for that particular domain.

All heuristics are fallable, but some are useful regardless..

I'd love to know what people can learn from their SPAM. This week I'm handling approximately 80,000 messages a day, per MX, which isn't huge (ie. 2-3 million a month).

ObQuote: Highlander

| 5 comments

 

Comments on this entry

icon Matt Simmons at 13:27 on 31 August 2008
It's possible to be very, very freaked out by patterns. Or massive changes.
Long before I had my blog, I used my LiveJournal to write occasional blurbs about technology.
At one point, I lost almost all of my spam. That was freaky. Of course, it came back later. Sorry the graph isn't there on that post anymore, but that host is long gone.
icon Anonymous at 19:51 on 31 August 2008
I wonder if you could configure spamassassin or similar to add a spamminess score for messages received during certain times of the day?
icon Steve Kemp at 20:48 on 31 August 2008

I've not used spamassassin, but given its extensible nature I'm sure it'd be very easy to add a point based on the time of the day.

I'd expect you'd want to define a symbol such as OUTSIDE_BUSINESS_HOURS for mails sent after 7pm, and before 8am, or something similar to that.

By itself that wouldn't be a useful thing to do. I guess you'd want it to be used on a per-domain basis, only in combination with other tests regardless.


icon Ben Finney at 01:52 on 1 September 2008
Note that "SPAM" in uppercase is the Hormel foods product. Unsolicited bulk messages are "spam". (The former is an abbreviation, the latter isn't. Even though one was named in reference to the other.)
http://www.spam.com/legal/spam/
icon Steve Kemp at 09:08 on 1 September 2008

The "spam" vs "SPAM" battle is one that is lost.

I'll refer to email-based spam as SPAM, unless or until I'm forced to no longer do so.