About Archive Tags RSS Feed


Entries tagged spam

I should be so lucky, again.

10 September 2007 21:50

Recently the topic of spam on the Debian lists was revisited. I laugh at somebody who recieves 200 spam messages a day.

Here's my stats for yesterday:

                                          Total Mails    : 6399
                                          Total SPAM     : 6077
                                          Total Accepted : 322

                                          Spam Percentage: 94.97%

That's 6077 mails rejected at SMTP time via my filters, and only 322 mails accepted.

The breakdown of the spam rejected looks like this:

                                  Plugin      Count
                                   dnsbl       3755
                             hosts_allow        724
                             greylisting        661
                       check_earlytalker        303
                          check_spamhelo        238
             require_resolvable_fromhost        219
                           virus::clamav         79
                         check_badrcptto         75
                       check_badmailfrom         23

| No comments


The problem with the English language is all those pesky words

29 June 2008 21:50

There exist many bayasian/statistical spam filters, ranging from products such as spambayes, and spamassassin, to crm114. Each of them works in their own way. Having used and tested almost all of them I've noticed a common flaw.

The vast majority of spam-filters struggle to correctly classify "419 scam" mails, lottery fraud, and similar mails.

Why is that? In general, having read hundreds of these mails, I can see several things that are common in these kind of the mails:

  • Mention of currency in both numeric and word forms. ($1,000,000 + 1 million US dollars)
  • Mention of a country / nationality (Sierra Lione, Nigerian)
  • Mention of a reference/claim number and often "official address".
  • Christian references.
  • Greetings such as "dear friend", and mentions of discretion/secrecy.
  • Size. (A scam mail is typically greater in length than an average spam mail).

Whilst none of these individually are indicative of a scam mail it is interesting to count their combined occurance.

I've written a toy program to count these things, and so far the success rate is >60% which is a reasonable start - providing this kind of detection occurs after normal filtering.

I may experiment further, but I figured a public query on scam detection might be appropriate.

Whilst the detecting a scam mail is a subset of detecting a spam email there are probably simplifications that may be made, and exploring those wouldn't be a bad thing.

ObQuote: Buffy.



There can be only one

31 August 2008 21:50

When volume becomes high enough you start to observe patterns in SPAM pretty easily. I think that this is primarily because people like to see patterns, whether they are present or not.

The trick is determining whether they are real patterns or not, and then to a lesser extent whether they are useful patterns.

For example I host mail for a business domain. That means that incoming messages come primarily from existing customers, and very rarely from potential new ones.

In practise that means that email is expected to arrive from 9am til 6pm (+/-2hours) Email received at 2AM? Either it is somebody working remotely, a foreign contact, or much more likely it is SPAM.

Now clearly you cannot dump all messages received at unusual times of the day, but it is a surprisingly robust SPAM indicator for that particular domain.

All heuristics are fallable, but some are useful regardless..

I'd love to know what people can learn from their SPAM. This week I'm handling approximately 80,000 messages a day, per MX, which isn't huge (ie. 2-3 million a month).

ObQuote: Highlander



You like playing rough, huh?

29 September 2008 21:50

According to my small business advisor it is possible to advertise your company, service, or product on the internet.

Who knows what gem of advice they'll offer next?

In unrelated news all mail delivered to me personally in HTML-only format(s) will be dropped. I've given up being patient.

Finally OpenID - what a pain it is to implement! I've fought with it over the weekend, in amongst rewiring my lighting. Setting up a Perl script to authenticate to an OpenID server is just gnarly. (I now have motion-sensitive lighting in my bathroom, which my kitten loves, and radio controlled lighting in the bedroom. Lazyness is ..)

ObQuote: Resident Evil Extinction

| 1 comment


I think I did pretty well under the circumstances

13 October 2008 21:50


This will be the last time I talk about this, but here's my anti-rubbish procmail filter.

It correctly copes with:

  • Foreign character sets that I can't read.
  • Bounces from joe-jobs.
  • Malformed emails.

Obviously edit to suit your tastes. Especially with regard to character sets - it is a wide brush which is tarring a group unfairly. That said in practise it works for me.

If you want real antispam filtering then you should probably be looking at externalising it, or having a layered approach.

ObQuote: Citizen Kane

| 1 comment


My hovercraft is full of eels.

28 June 2009 21:50

Recently I've been seeing an awful lot more bounced mail addressed to my domains, to the extent that I now wonder whether they are deliberate "attacks".

Over the past four or five years I'd expect to receive one joe-job attack every six months. Over the past two that's risen to once every two months. For the past two months its been once a week.

I run several domains on my Xen guest, and most of those domains rarely have mail received, so there are only a few localparts. (A "localpart" is the bit before the @ sign in an email address.)

My main domain is steve.org.uk and unfortunately this was historically setup with "catchall" behaviour. I used that wildcard expansion pretty seriously so I had localparts such as "slashdot.org", "lwn.net", etc. Over time I've stopped making up new addresses and just stuck with "steve".

Still I'd never quite gotten round to enumerating all valid localparts, instead I tried to mitigate against these rare bounce storms with various simple hacks. For example the following procmail recipe to file away bounces:

#  Bounces

However this doesn't work as well as it used to - too many idiots people are using challenge/response systems so I'll receive a reply to a mail I didn't send which doesn't look like a bounce (ie. There is a real envelope sender.)

In short blocking bounces by detecting an empty envelope sender is not a complete strategy these days. I started down the heuristic path blocking mail to "unlikely" localparts via patterns such as:

[0-9]@        DENY  Localparts never end in digits
,             DENY  Localparts never contain a comma
|             DENY  Localparts never contain PIPES.
^([^a-zA-Z])  DENY  Localparts start with a-z/A-Z
"             DENY  Quotes are never used in accounts on this system:
'             DENY  Quotes are never used in accounts on this system:

That was actually a simple change to make, via the addition of a new QPSMTPD plugin and it managed to block a lot of the bounceback spam - regardless of the envelope sender. For example:

IP:    sender:<> Recipient:treacherously9@steve.org.uk
IP: sender:<> Recipient:envoyz0@steve.org.uk

Blocking "unlikely" localparts wasn't perfect, but without implementing BATV or enumerating valid localparts there wasn't too much else that I could do. In terms of numbers yesterday I blocked just over 18,500 messages with these six rules.

I also wrote a couple of cronjobs to look at the contents of the Automated.bonces folder so that I could add per-user rejections on the specific addresses being received - with some whitelisting.

(For example if I received 20+ bounces to fluffy32qp@steve.org.uk within the space of ten minutes I'd drop further mails to that address automatically.)

Anyway enough is enough. Today I woke up to just over 40,000 replies to mails I didn't send. I've now scanned my mail directories for all the email addresses I've ever used and will now only accept mail destined to those localparts.

Thankfully it turned out that since 1999 (when steve.org.uk was registered) I've only used about 150 distinct localparts, and many of those are now obsolete. So hopefully I'll now have less of a problem.

It seems to be paying off already:   wpc0505.host7x24.com  <>  virtual_rcpt_ok
    901     mail to subtotalingxa@steve.org.uk not accepted here (#5.1.1)   cobra.compukey.net    <>  virtual_rcpt_ok
     901     mail to suctionsw@steve.org.uk not accepted here (#5.1.1)   box19.fuitadnet.com   <>   virtual_rcpt_ok
     901     mail to reappearcum@steve.org.uk not accepted here (#5.1.1)

In the future this means I could still get flooded with bounces, but there will be two outcomes:

  • The bounces will not hit valid localparts and will be dropped easily, quickly, and cheaply.
  • The bounces will hit valid localparts:
    • Real bounces will end up in Automated.bounces/
    • Challenge/Response things will still reach me. Sigh.

Still this is progress and I can steal some ideas from this great spam filtering service (ahem) to improve the handling of those! (I explicitly chose to use a similar but different system for my personal mails. Even though my support system is on another box I want to avoid problems where failures requiring human intervention are swallowed in the same way that the original one was. Those kind of reasons mandate a similar system but different implementation.)

I guess I could publish some of the qpsmtpd plugins I use locally virtual_rcpt_ok, virtual_badusers, rcpt_pattern_test, etc. Then again most people who do funky things with qpsmtpd will have plenty of choice already.

ObFilm: Monty Python's Flying Circus. (OK technically not a film. Sums up my mood though.)



Let go of the handle.

21 February 2010 21:50

I don't talk about SPAM publicly these days, for reasons that are probably self-explanatory.

However this is just insane:

  • Saturday 20th February 2010: Registered a new domain.
  • Sunday 21st February 2010: Received first spam.

Currently at 40+ SPAM mails and rising; all mails addressed to "postmaster@", rather than any past users of the domain. (I can see from http://archive.org that the domain was last active in 2008.)

ObSubject: The Goonies

| No comments


sysadmin.im considered harmful

25 July 2010 21:50

Not for the first time I find my blog content copied and hosted elsewhere. This time via http://sysadmin.im/.

Mostly I care little if people rehost my content. But when people claim to have written it (e.g. "Posted by Admin") I get annoyed.

No explicit contact details are posted, probably to avoid complaints.

Update: Fixed URL. Stupid do.tted.na.mes.



IPv6 email

26 February 2011 21:50

I've been slowly moving towards full IPv6 usage on my main machines for the past few months. My main servers all have IPv6 setup and appropriate DNS records in place.

This weekend I configured my mailserver, which is based upon QPSMTPD & exim4, to be available on IPv6 too. Previously it would send mail via IPv6 where appropriate, but only receive mail via IPv4.

QPSMTPD I've written about a lot in the past, and indeed I did commercial things with it for a year or two, but in short it is more of an SMTP framework than an actual mailserver.

These days I use a small collection of plugins which test incoming mail in various ways, and either:

  • Reject the mail at SMTP time, causing a bounce, and store a copy of the rejected mail in a quarantine.
  • Accept the mail, and pass it on to exim4 for (local) delivery.

My plugins are pretty simple, but I've made a few changes for the brave new IPv6 world:

  • Breakdown reverse-DNS checks into IPv4 & IPv6 flavours.
  • Avoid using DNSBL for IPv6 addresses.

I reject (+ archive) about 8,000 SPAM messages a day. So far I've seen precisely zero SPAM mails be received via IPv6; though I'm sure that won't last for long!

My reject archive looks like this:

steve@steve:~$ tree -d -L 2 /spam/
|-- 23
|   |-- debian-administration.org
|   |-- mail-scanning.com
|   `-- steve.org.uk
|-- 24
|   |-- debian-administration.org
|   `-- steve.org.uk
|-- 25
|   |-- debian-administration.org
|   |-- mail-scanning.com
|   `-- steve.org.uk
|-- 55
|   |-- debian-administration.org
|   |-- mail-scanning.com
|   |-- steve.org.uk
|   `-- stolen-souls.com
|-- 56
|   |-- debian-administration.org
|   |-- steve.org.uk
|   `-- stolen-souls.com
|-- today -> /spam/56
`-- yesterday -> /spam/55

(Here "N" is the day of the year - Think of this as "date +%j". I rotate such that I keep 32 days of past SPAM mail, for reference/amusement/mistake-catching.)

ObQuote: "I am already grown up, I just get older. " - Leon

| No comments


Some domains just don't learn

5 February 2012 21:50

For the past few years the anti-spam system I run has been based on a simplified version of something I previously ran commercially.

Although the code is similar in intent there were both explicit feature removals, and simplifications made.

Last month I re-implimented domain-blacklisting - because a single company keeps ignoring requests to remove me.

So LinkedIn.com if you're reading this:

  • I've never had an account on your servers.
  • I find your junk mail annoying.
  • I suspect I'll join your site/service when hell freezes over.

I've also implemented TLD-blacklisting which has been useful.

TLD-blacklisting in my world is not about blocking mail from foo@bar.ph (whether in the envelope sender, or the from: header), instead it is about matching the reverse DNS of the connecting client.

If I recieve a connection from and the reverse DNS of that IP address matches, say, /\.sa$/i then I default to denying it.

My real list is longer, and handled via files:

steve@steve:~$ ls /srv/_global_/blacklisted/tld/ -C
ar  br  cn  eg  hr  in  kr  lv  mn  np  ph  ro  sg  tg  ua  ve  zw
aw  cc  cy  gm  hu  is  kz  ma  my  nu  pk  rs  sk  th  ug  vn
be  ch  cz  gr  id  it  lk  md  mz  nz  pl  ru  su  tr  uy  ws
bg  cl  ec  hk  il  ke  lt  mk  no  om  pt  sa  sy  tw  uz  za

On average I'm rejecting about 2500 messagse a day at SMTP-time, and 30 messages, or so, hit my SPAM folder after being filtered with CRM114 after being accepted for delivery. (They are largely from @hotmail and @yahoo, along with random compromised machines. The amount of times I see a single mail from a host with RDNS mysql.example.org is staggering.).

(Still looking forward to the development of Haraka, a node.js version of qpsmtpd.)

ObQuote: "Mr. Mystery Guest? Are you still there? " - Die Hard



Bye-bye AOL

9 April 2012 21:50

Today I took that final step:

touch /srv/_global_/blacklisted/domains/aol.com

I remember, even a couple of years ago, I had friends who would mail me from their @aol.com email addresses. These days people have moved on.

The two single biggest mail-providers I see in terms of spam are:

  • @yahoo.com - 832 in the past nine days.
  • @aol.com - 242 in the past nine days.

I'd like to drop @yahoo.com but I have some (misguided) friends who continue to use it to mail me. I might start dropping non-friend mails from that domain, but that's a bigger job.

Yes, this is a dull entry. Sorry. My existing bathroom has been ripped out and turned into this as a stepping stone into its new incarnation. I'm trapped in my office. Dust almost everywhere. Noise everywhere else.

ObQuote: "There can be no understanding between the hand and the brain unless the heart acts as mediator. " - Metropolis



A small assortment of content

10 April 2014 21:50

Today I took down my KVM-host machine, rebooting it and restarting all of my guests. It has been a while since I'd done so and I was a little nerveous, as it turned out this nerveousness was prophetic.

I'd forgotten to hardwire the use of proxy_arp so my guests were all broken when the systems came back online.

If you're curious this is what my incoming graph of email SPAM looks like:

I think it is obvious where the downtime occurred, right?

In other news I'm awaiting news from the system administration job I applied for here in Edinburgh, if that doesn't work out I'll need to hunt for another position..

Finally I've started hacking on my console based mail-client some more. It is a modal client which means you're always in one of three states/modes:

  • maildir - Viewing a list of maildir folders.
  • index - Viewing a list of messages.
  • message - Viewing a single message.

As a result of a lot of hacking there is now a fourth mode/state "text-mode". Which allows you to view arbitrary text, for example scrolling up and down a file on-disk, to read the manual, or viewing messages in interesting ways.

Support is still basic at the moment, but both of these work:

  -- Show a single file
  show_file_contents( "/etc/passwd" )
  global_mode( "text" )


function x()
   txt = { "${colour:red}Steve",
           "Made this work" }
   show_text( txt )
   global_mode( "text")


There will be a new release within the week, I guess, I just need to wire up a few more primitives, write more of a manual, and close some more bugs.

Happy Thursday, or as we say in this house, Hyvää torstai!

| 1 comment


On the names we use in email

18 October 2014 21:50

Yesterday I received a small rush of SPAM mails, all of which were 419 scams, and all of them sent by "Mrs Elizabeth PETERSEN".

It struck me that I can't think of ever receiving a legitimate mail from a "Mrs XXX [YYY]", but I was too busy to check.

Today I've done so. Of the 38,553 emails I've received during the month of October 2014 I've got a hell of a lot of mails with a From address including a "Mrs" prefix:

"Mrs.Clanzo Amaki" <marilobouabre14@yahoo.co.jp>
"Mrs Sarah Mamadou"<investment@payment.com>
"Mrs Abia Abrahim" <missfatimajinnah@yahoo.co.jp>
"Mrs. Josie Wilson" <linn3_2008@yahoo.co.jp>
"Mrs. Theresa Luis"<tomaslima@jorgelima.com>

There are thousands more. Not a single one of them was legitimate.

I have one false-positive when repeating the search for a Mr-prefix. I have one friend who has set his sender-address to "Mr Bob Smith", which always reads weirdly to me, but every single other email with a Mr-prefix was SPAM.

I'm not going to use this in any way, since I'm happy with my mail-filtering setup, but it was interesting observation.

Names are funny. My wife changed her surname post-marriage, but that was done largely on the basis that introducing herself as "Doctor Kemp" was simpler than "Doctor Foreign-Name", she'd certainly never introduce herself ever as Mrs Kemp.

Trivia: In Finnish the word for "Man" and "Husband" is the same (mies), but the word for "Woman" (nainen) is different than the word for "Wife" (vaimo).



Recommendations for software?

15 September 2018 12:01

A quick post with two questions:

  • What spam-filtering software do you recommend?
  • Is there a PAM module for testing with HaveIBeenPwnd?
    • If not would you sponsor me to write it? ;)

So I've been using crm114 to perform spam-filtering on my incoming mail, via procmail, for the past few years.

Today I discovered it had archived about 12Gb of my email history, because I'd never pruned it. (Beneath ~/.crm/.)

So I wonder if there are better/simpler/different Bayesian-filters out there at that I should be switching to? Recommendations welcome - but don't say "SpamAssassin", thanks!

Secondly the excellent Have I Been Pwned site provides an API which allows you to test if a password has been previously included in a leak. This is great, and I've integrated their API in a couple of my own applications, but I was thinking on the bus home tonight it might be worth tying into PAM.

Sure in the interests of security people should use key-based authentication for SSH, but .. most people don't. Even so, if keys are used exclusively, a PAM module would allow you to validate the password which is used for sudo hasn't previously been leaked.

So it seems like there is value in a PAM module to do a lookup at authentication-time, via libcurl.