To build a community, you need your visitors to comment on your site. Unfortunately when you do that, you open up your site to others who you might not want to come calling - namely spammers, who will leave all sorts of garbage on your (virtual) doorstep. While we probably won't ever be able to get rid of them, managing spam feedback is a completely bearable process.

Depending on who you ask, you're likely to get a wide variety of answers on the best avenue to take when it comes to plugins to use or configuration directives to take in the fight against spam. You'll see names like Akismet or Defensio mentioned, and plugins such as MT-Approval and Tiny Turing thrown into the mix. Some will tell you that you need a CAPTCHA and some will tell you that there's just no way to win. In the end, you don't really need much more than a little creativity and some patience.

Direct Access of the Comment Script

The first item up for debate is whether or not the spam-bots will hit your comment script directly (without going to the individual archive entry). While it is possible, after monitoring thousands of spam comments over many months, the answer, at least at this point, is generally that they will not. I think that this is because they need an entry ID, and to get that, it is usually just easiest to submit the form on your individual archive page. Knowing this means that plugins such as CommentChallenge and similar functions offered by MT-Approval are less than worthwhile. You may not want to waste your time looking for something that short-circuits the comment process if not submitted correctly, because it doesn't happen often.

With that said, where these type of plugins - and CAPTCHAS and Tiny Turing - can come in handy is if they offer a value in the field, and that value can be checked to make sure that the value is correct. In this case, it's a test to verify that it's not just a blank being filled in and submitted. At that point, it can help test to make sure it's not an automated submission. But you also have to consider that your visitors may not like the extra step, and in that case you may want to look at a plugin such as MT-Approval, which can do some of this work behind-the-scenes, without the visitor actually having to enter any data.

Further Considerations of Automated Comment Spam

Now that we have eliminated the possibility of data being submitted directly to the comment script, we need to cut down on automated submissions through the comment form. To do this, you can again turn to a CAPTCHA-type solution, as mentioned above, or you can implement a JavaScript solution, whereby the actual submission is obscured unless the page is visited by a JavaScript-capable browser. The reason that this works is that most automated posting tools don't support JavaScript. There are just too many other pages where it's not needed, so it's not been developed, and they will take the low-hanging fruit while they can.

All that you need to do is change the name of the form action to something that isn't the valid script. For instance:

<form name="comments_form" action="<$MTBlogURL$>" method="post">

This example just sets the action to the root directory - which means that nothing at all will happen when the form submits (the user will get the home page, which may be confusing). You may want to set the page to a short explanation of why the page is received, such as "You need to turn on JavaScript to comment" or something. The really important part is to add a small bit of JavaScript so that the action will be set correctly for users with JavaScript turned on. It should look like this:

<script type="text/javascript">
 var a1 = '<$MTCGIPath$>';
 var a2 = '<$MTCommentScript$>';
 document.contact_form.action = a1 + a2;
</script>

The reason for breaking it up into two separate variables is so that the entire path is never specified together so that the bot can see it. It's only assembled when the page runs, and since most bots can't interpret the JavaScript, they'll never find it - they will get an action as specified above, which won't work!

It is interesting to note that even if you use the default script name of mt-comments.cgi, you will see a serious downturn in spam comments. By implementing it on my sites, my junk comment folder dropped in size by a factor of almost five. Pretty incredible. No more changing the name of the file - they just can't find it any longer, even though it hasn't moved anywhere! For those who advocate changing the name of your script to avoid detection, this tends to be a bit less than obvious, but it just works.

Some time ago, Mark Carey put out a plugin called MTDisguiseCommentURL that can do this for you. I haven't tried it, but Mark does a good job on his plugins, so if this doesn't make sense to you, you may want to try that route instead.

Now We Can Handle the Manual Spam Problem

Once you've managed to get rid of all those automated bots, you're left with manual spammers. Unfortunately, there isn't a lot that you can do about them. But what you will find is that most of the nonsense postings like "good site" and "thanks for the information" come from automated posting tools that are trying to leave markers that they can search for and come back later, to post other information. When a manual spammer leaves a comment, they want to make the most of their time (since it takes more effort) and they will leave links. Lots of them.

The Plugin Dilemma

You can elect to use something like Akismet or Defensio and block them that way, but you don't really need to do so. An important factor to keep in mind with plugins is that the more you add, the more it may slow down your system - both with loading the back end of Movable Type and with processing comments. Remember the problem with updating servers for SpamLookup a while back? There are still people who haven't done that!

In any case, I've found that you don't even need to use the built-in SpamLookup lookup functions. In fact, I've had them turned off for months, and only just realized it when I went to look up something for a friend of mine, realizing that I had done this and hadn't noticed an uptick in spam at all. Are there exceptions? Of course - some people will post a single link, and they will get through. But that's the exception rather than the rule.

As it turns out, I have most settings in SpamLookup turned off. In the Lookup Settings, I have turned off all lookups, which means that the comment doesn't even leave my server (which should make privacy advocates happy). Though I could use the lookup whitelist, since I am not using any lookups, it's rather pointless to do so.

In the Link Settings, I have all options unchecked except for the Link Limits. Under those options, I have chosen to moderate more than 3 links and junk more than 10 (with a score weight of 10). These are the defaults except for the score weight. I do occasionally use junk keywords, but it happens so rarely that I almost never have to worry about it.

All other settings are off and I have no other spam plugins installed, and as I mentioned, after running with these settings for a few weeks, my junk folder is actually at the lowest level that it has been for some time. I only keep 3 days worth of junk, and I have just under 300 comments. Sure, it's a lot, but when I don't have to look through the folder for false positives (how often do you get valid comments with 10 links in it) and it's more than five times less than I was getting previously, I consider that a victory.

With remote servers going offline, and other servers returning false positives, I'm happy to go this route, with settings that catch as much spam as possible without running the risk of killing conversations, while keeping things running as smoothly as possible.

What about you? What are your settings? Which plugins do you use? What sort of results have you seen?

Comments (7)

Battling comment spam seems to be an ongoing and ever-changing field. I've used a few different obfuscation methods over the years and one of the best tricks seems to be to just change once in a while. Spammers eventually become savvy to your method and it becomes the popular thing--so, you change. I'm currently being swayed more and more to the "dark side" of the image captcha because it works so well. ...at least, for the moment.

Hey Dan -

I agree that change is good, and naturally as soon as I say it, it will likely change (in fact, I received the first spam that got through this method yesterday, a few hours before this scheduled post went live), but I think it still stands to reason that most of what is in here is fairly logical.

The reasoning is sound (at least for now!) - automated spammers have a cost of next to nothing, so they can leave anything at all. Since they need to pick up the entry ID, they'll generally need to go through the comment form, which has the ID hidden in a form field, rather than accessing the script directly. Adding a method to get rid of automated spam is a huge chunk of the battle, and simply adding JavaScript takes care of a large chunk of that.

CAPTCHAs can do it too, but you run the risk there of alienating users. Something like CommentChallenge can work in a simple sense, though I prefer TinyTuring a bit better, since it's an easier test (and it varies per entry), but we've already eliminated the automated bots, so even that is really just placing the burden on your users. In that case, you could use something like the beacon system in CommentChallenge (not the challenge-response mechanism), or the more detailed aspects of MT-Approval to do the same - but again, you're victim to the manual attacks.

At that point, you have someone visiting your site, and since they probably have a macro set up to post to your form, it's going to take them some time to do it. Therefore, it's in their best interest to get as many links as possible into the comment, making that one test about all that you may need.

Of course, what works here may not work elsewhere, but it's remarkably effective. It may just need adjusting for the number of links.

What I have found is that the SpamLookup blacklist - especially the use of zen.spamhaus.org as a lookup service - tends to blackball everyone (in the US, anyway), since just about every broadband user is going to end up on the PBL, which means it's a negative result.

Akismet isn't a lot better, and many results end up getting a spam result, so disabling it didn't see an effective increase in spam comments. If it can all be done with a single test, while remaining on my server, so much the better. :)

"Akismet isn't a lot better, and many results end up getting a spam result, so disabling it didn't see an effective increase in spam comments."

I'm curious which site you saw that kind of performance on, Akismet typically operates at four nines of accuracy or higher.

Hi Matt -

Glad to see you drop in. :)

That comment was geared to this site in particular.

To reiterate, in case it wasn't clear (I had to re-read it to make sure I was saying it correctly, and I think I was): When I disabled Akismet (MT-Akismet, since I use MT), I didn't see an increase in spam.

Since I did so, roughly a month ago, I've had a grand total of six comments get through the other defense measures that are in place - essentially, mark a spam as junk if it contains more than 10 links. Those six comments contained a single link, generally to an adult site.

This isn't saying that Akismet wouldn't have caught those comments. It's just saying that to run Akismet only to catch six comments seems like a waste. I can handle the manual junking of six comments over the course of a month. I think most people can.

There are certainly other places where it might be different, and they may see more than that number of comments slip through - but for me, I just don't think it's worth it.

Since you're here, I can also say that I've had clients who run certain topics on their blogs, and they've seen Akismet mark every single comment as spam - perhaps because the subject is generally one that is spammed. In that case, it's less than useful, so they need to consider other options.

What are my settings? these.

In terms of junker adapting, I haven't been seeing much innovation lately. It's been a few months since I have updated my SpamLookup filters and I still get very little junk, one a week or so. My IP ban lists are still in the 10K-20K range, though.

After unsuccessfully trying reCaptcha on MT 4.1, I decided to go with your approach. Pretty much just copy-pasted (except that the javascript function needs to refer to document.comments_form instead of contact_form) and have been very happy with the results. Thanks for the article.

Hi Jonathan -

Glad that it worked well for you!

Leave a comment