A subject came up today about writing a regular expression to block a particular text string in comments received by a Movable Type blog. If you aren’t familiar, the Spam Lookup plugin gives you great power over the processing of comments (and trackbacks, if you happen to receive them) by parsing the content of your comments in order to help try and determine if the comment – or trackback – is spam prior to posting it to your live site.
Unfortunately, the problem with doing this is that it often requires writing a regular expression in order to do so. Regular expressions by their very nature don’t have to be complex – but they certainly can be daunting for a beginner. There are many sites available to you, but I’ve found simply searching and then trial and error can often yield the best results. Perhaps starting with a tutorial is a good idea.
Once you feel like you have the basics down, and you want to test it out, you could enter your regular expression into SpamLookup, and then post a comment to see what happens. And you probably should. But first, you may want to try an online regular expression tester. Simply enter your pattern, then enter the text. Make sure that you’re using Perl-compatible regular expressions and you should be up and running in no time.
But there is more to SpamLookup than regular expressions. In fact, these days, I rarely use them, simply because I don’t like having to maintain such a list. I’m more a fan of the “set it and forget it” mentality. If you’ve never looked at SpamLookup before, take a look now. You’ll find that it has three distinct settings: Lookups, Link and Keyword Filter. The last one of those – Keyword Filter – is where you enter the regular expressions. But the first two are actually quite powerful in their own right.
The Lookups section offers perhaps the most power, as you can allow SpamLookup to use both IP blacklist and domain blacklist services to lookup the commenter, and if they are found, use that information to block (or moderate) the feedback. Simply enter the service you’d like to use in the appropriate box. You can also whitelist particular addresses, so that if they come from those sources, they won’t be looked up at all!
The Link section is useful, but you’ll want to be careful here, because you may unwittingly be giving the spammers more power than they deserve. Specifically, I will usually turn off the “credit” checkboxes, because I don’t want to credit anyone. Giving credit just because no links are present means that a spammer can post bogus information in their comment, and just because there isn’t a link, it just might make it into the site. I don’t want that. Similarly, I don’t want them to get credit just because they might have managed to post a valid comment previously, and their email or URL is already in the comment database. As to the number you use for when to moderate and when to junk, those are up to you.
Which brings us to Akismet. This software, developed by Automattic, the makers of WordPress, is a collaborative spam catching plugin. In other words, by submitting your comments to a centralized database, along with other people who do the same, they can all be analyzed for patterns to determine spam. It’s similar to Spam Assassin, a wildly popular (and successful) open source spam filter for email.
However, it does require that you install a plugin. Luckily, one has been developed for Movable Type. Two of them, actually. Go download the official one or the unofficial one (or both). To use the plugin, you’ll need to get your own API key from WordPress.com. You don’t have to sign up for a blog, you just need a key. Enter it into the plugin and you’re ready to go. You may also want to adjust the scoring that Akismet returns to you, but other than that you are done.
Which plugin should you use? That’s up to you. I prefer the official one, because I think it works a bit better, and because when you junk comments it sends the information back to Akismet for updating the central database. As of the last time I checked, the unofficial plugin didn’t do this. It should still work, but it won’t help the centralized tracking, so you can decide for yourself.
At this point, you’re set. Both Akismet and SpamLookup are running and protecting your blog.