Wednesday, 16 November 2011

RewriteRule syntax explained

For a simple URL makeover:
RewriteRule ^oldstuff\.html$  newstuff.html
This checks to see if a page named oldstuff.html was what was requested.  If so, it transfers control to the file newstuff.html to generate the webpage and send it back to the client. The client (the bot or the browser) still thinks they're looking at a page called oldstuff.html.  No 301.




Other notes:  the ^ indicates the start of the page name, so that the rule will match oldstuff.html but notreallyoldstuff.html. The $ indicates the end of the filename, so that this rule will match oldstuff.html but notoldstuff.htmlly.  That slash in the middle?  Well, that 1st parameter is a regular expression (often referred to as regex), and in regular expressions, . is a wildcard that matches any single character.  Preceding it with a \ is called "escaping" the character, and indicates that we don't mean the wildcard character . but rather we actually mean a period.
Now, a 301:
RewriteRule ^oldstuff\.html$  newstuff.html [R=301,L]
This is a 301 redirect.  It's a redirect because we used the R flag inside the [].  It's a 301 because we put=301 after the R; if we'd left that out, it would be a 302 redirect, which indicates we just temporarily moved the page, and links to it wouldn't pass any link juice. 99% of the time you're going to want to use a 301, NOT a 302.


here are two parameters inside the [] brackets, separated by a comma.  That second parameter, the "L", stands for Last.  It says that if the regex pattern matches the page that was just requested, then after whatever processing is done (in this case, the 301 redirect to newstuff.html) then we can skip checking the page against any of the other rules in the .htaccess file.  99% of the time you'll want to use the L flag with your 301 redirects.

92% of the time you'll want to use the L flag with your non-301 rewrites.  Why not 99%?

Sometimes it's helpful to have multiple rewrite rules applied to an incoming URL.  Let's say you have a number of first-level folders which you want to rewrite, plus you have a number of subfolders you want to rewrite as well...each of which occurs in all of the 3 first-level folders.  You can do your main folder name substitution in one RewriteRule (preserving the next level folder as-is for now), then apply a secondRewriteRule that preserves the just-updated top folder while rewriting the next folder down.
Example:

Original URL:
  • /prods/metal1/necklace-11623.htm

RewriteRule #1 might substitute /jewelry-products/ for /prods/ so now you have:
  • /jewelry-products/metal17/necklace-11623.htm

RewriteRule #2 might substitute /gold/ for /metal17/ giving you:
  • /jewelry-products/gold/necklace-11623.htm

Now, for bonus points, let's say we have an entire catalog of jewelry pieces in their, each with a glorious photo named [product ID].jpg. How very convenient for our database and our programmer.  How terribly sucky for SEO for image search. Remember how I said that requests for images go through .htaccess as well? You can use RewriteRule to map the name of the image to something more friendly too, so that you can show Googlebot an image named something like:
  • /images/necklaces/gold/amethyst-11623.jpg

Instead of the real filename:
  • /images/prods/11623.jpg
Now, RewriteRule isn't the only way you can do a redirect or do an URL makeover. Next week I'll post about how to do this in your 404 error handler--there are some advantages to doing it there instead, including ease of debugging your translations, ability to translate from words in the URL to IDs by looking them up in your database, and performance benefits for large sites.




No comments:

Post a Comment