Getting a Handle on Email Spam

One of the most annoying items that we all face is undoubtedly email spam. Unfortunately, there is little that we can do about it, and so I figured that it was time to try and understand it better. Not so long ago, I made two changes that made this not only possible, but useful. First, I switched from using POP3 email to IMAP. This in turn led to the other change, which was to (finally) dump Outlook and start using Thunderbird exclusively. In fact, the only time that I’ve used Outlook in the last few months was to help a family member figure out what was going wrong. But that’s a story for another time.

What these two changes have done has allowed me to look at email spam differently. You probably already know that Outlook handles junk email. I’ve mentioned it before. More than once, actually. Unfortunately, the Outlook junk filter doesn’t have much of a brain to it. I’m sure it does something, but it doesn’t seem to ever tell you anything about it, and operating in a void doesn’t help anyone. Thunderbird, meanwhile, doesn’t tell you a lot either, but it does interact with the most popular spam filter on the planet (SpamAssassin) and it also allows you to tag items as junk and not junk, which supposedly will help future items. I don’t know if that helps or not, but it feels like it does, and sometimes that makes all the difference.

The real reason that I switched from Outlook the Thunderbird was that the IMAP support in Outlook stinks. Really, really stinks. And though webmail has come a long way, I like having an IMAP client that is useful. So the spam functions are just icing on the cake. What’s more, using built-in filtering functions that you can find on most servers, you can even train your spam processing to get even better at marking your spam as such through a SpamAssassin function called sa-learn.

You do have to keep an eye on this, because it seems to have a limit on the amount of data that it can process (at least a pair Networks), so you can’t just send it every spam message you’ve ever received. But the first step is to designate a folder as your spam folder. In Thunderbird, you want to call this folder Junk in order to get a nice little icon, but you don’t have to do that – whatever you set up in the settings will work fine. Next, keep an eye on how many days worth of articles you keep. Depending on the amount of spam you get, this may put you over the limit. At pN, the limit is 20MB (or was the last time I checked), and this equates to somewhere in the neighborhood of 4000 messages – but that too can vary, depending on the type of spam you get.

When I first started monitoring, I was getting as much as 250 per day, and I was able to keep about two weeks worth. But as I updated my spam rules, and I saw my spam intake drop, I’ve been able to bump up the amount of spam that I keep to closer to three weeks. Currently it’s running at 21 days, and that sees a volume of roughly 3700 messages, for an average of 175 per day most days. Even more importantly, I rarely ever have to mark anything as spam any longer – as soon as it comes in, it goes straight to the junk folder, which is as it should be.

The piece that makes this happen is to train the processor based on your spam, and that’s why I bother keeping spam messages around at all. Since they are in a folder that I don’t bother to look at, it really doesn’t hurt me, and the overall volume of the folder isn’t large (certainly less than 20MB), so it doesn’t affect my storage space. It’s definitely a worthwhile tradeoff for not having a cluttered inbox.

In order to make this happen, you will first need to locate your spam folder in the directory tree. At pN, you can find it in the boxes folder, then under the domain, the username follwed by a caret, .imap, and then the actual folder name (for instance, Junk). So it might look something like this:

~/boxes/example.com/username^/.imap/Junk

That’s where you’ll find the contents of all your spam messages (assuming that is the folder you use). And that’s where you’ll want to point the sa-learn script, in order to start learning what is spam. Just run it with the –spam switch, like so:

sa-learn --spam ~/boxes/example.com/username^/.imap/Junk

If you get a message about the content being too large, you’ll need to trim down to less than that limit by removing a few days from the file first – perhaps you have some video or image files that can help you easily get rid of some data. Then try again. Once you have done so, you just submit the job again, and you will be told that your messages have been submitted to the queue. I try and run the job at least weekly, to update any new spam message patterns that have come in over that time. I don’t schedule the job (though I could), as I like to review the spam folder to make sure that no valid messages are in there, which could potentially alter the learning process.


Posted

in