Modifying MT Search

As I mentioned yesterday, I recently managed to hack the Movable Type search module in order to provide Last-Modified dates on my search results. This is useful, as it may help save in bandwidth costs. Instead of having a Last-Modified date of whenever the search was run, the Last-Modified date will be from the last modification of the search results.

If you want to see what I mean, use Web-Sniffer to pull up one of your search result pages. Chances are that you will not see a Last-Modified date on it (look in the HTTP Response Header section of your results). If there is one, it’s likely the time you ran the request. This means that every time someone or some robot runs that search, it returns the entire page with a date of when the search was run. This entry explains how to make that Last-Modified header reflect something useful – like the date of the last result.

Adding this feature does require you to make changes to the default Search.pm module included with Movable Type, so I suggest that you are at least passingly familiar with both Movable Type and Perl. You should be able to maneuver in a text editor of your choice, and also be able to follow detailed instructions.

This change also requires the use of Kevin Shay’s excellent UTCDate plugin. This plugin is easy to install and requires only a single additional module, which is often included in the default Perl distribution.

Disclaimer: This is merely an account of what I did to change my own installation. This is in no way a warranty of function or a guarantee against something breaking when you attempt to do the same, and no warranties, guarantees or liability should be implied.

Datclaimer: It ought to work. Really. And if you run into problems, I’d like to help. Let me know if you find any problems. But please do keep in mind that I’m doing this for the fun of it, I have both a job and a family, and I may not be able to get to you as quickly as you’d like. In any case, you’re the one making the changes to your system, so you’re responsible for the results.

Before you do anything, back up Search.pm. This may be an integral part of your site, and if it breaks, it might not be a good thing. So just copy the file and save it as a different name. You never have to use it, just make sure you have it. Sure, you could also get one from the default MT installation, but that’s generally going to be a lot more work.

First, load Search.pm into that text editor that I mentioned before. I’m sorry, but I’m not going to tell you where to find Search.pm. If you don’t already know where it is, or can’t find it, then you may not want to proceed.

Next, search for this string:

  $ctx->stash('search_string', encode_html($str));

In the default installation, there is only one occurrence. Immediately after that is a closing brace (}), which signifies the end of the iteration. On the line immediately after that closing brace, insert this code:

  my $lmod = '';
  if (@results) {
   my $entry = $results[0]{entry};
   $lmod = $entry->modified_on;
  }

This code checks to see if there are results, and if there are, it gets the modified_on date of the last entry in the list, which is the most current.

Next, search for this string:

  "Building results failed: [_1]", $build->errstr));

In the default installation, there are two occurences of this string. You want the second (and last) one. On the line immediately following this, insert this code:

  if ($lmod) {
   local $SIG{__WARN__} = sub { };
   require 'plugins/UTCDate.pl';
   $lmod = MT::Plugin::UTCDate::_hdlr_utc_date(
   $ctx,
   { date => $lmod,
    format => "%a, %d %b %Y %H:%M:%S GMT",
    no_dst => 0,
    offset => 'blog' }
   );
   print "Last-Modified: $lmod\n";
  }

This section of code checks to see if there is a Last-Modified date ($lmod), then makes use of the UTCDate plugin to parse that date into the correct format, and finally sends the header information so that the results will show the Last-Modified date. If you’d like to disable daylight saving time or use a different offset, change those values in the parameters being passed to _hdlr_utc_date – they work just like the arguments used on the UTCDate tag.

And that’s it. Save your changes, replace Search.pm on your server, and you should be all set.

To check your modifications and see if they are being used, reload your search results in Web-Sniffer and take a look at the Last-Modified header. It should reflect the date of the last entry in the results list. Note: By last, I mean of course the most recent entry. In the default MT search, this is the entry at the top of the list! Sorry for any confusion.


Posted

in

Comments

2 responses to “Modifying MT Search”

  1. Chad Everett Avatar

    The processing time and resources of the search request are undoubtedly important as well. Naturally, this hack doesn’t address either, though I’d love to have a discussion on that aspect.

    In any case, real-world performance is simply a savings of bandwidth. XML for a particular entry (this one, for instance, prior to me adding this comment) is about 9KB. Not huge by any means. The XML for The Angler Fish, meanwhile, is about 73KB. If I can send a header that includes Last-Modified information, then that turns into maybe 1 or 2KB. Not a huge savings, but if someone subscribes, it can add up quickly.

    I used the XML of individual entries, as search results can vary widely – there might be no results, in which case it’s pretty small. On the other hand, there could be several entries for larger searches, which may result in larger files. As with just about anything, your mileage may vary.

  2. Michael Croft Avatar

    Hmm. Potentially useful, but I’m less worried about bandwidth than the end-to-end time of a search request. I recently turned on the cache in MySQL and it saves about 40% of the query time on most repeated searches, and it also speeds up rebuilds. What kind of real world performance advantage do you get with this hack?