Now my CSS doesn’t work

A frequent problem that occurs when people use mod_rewrite is that they’ll discover that the resulting page, now loaded from the new URL, no longer loads embedded CSS, images, and javascript files. There are two main reasons for this:

1) The embedded files have URLs which are also being rewritten, which isn’t desired.

2) The embedded files have relative URLs, which are now interpreted incorrectly.

The solution to #1 is to put a RewriteCond in in front of your RewriteRules that exempts these URLs from rewriting:

RewriteCond %{REQUEST_URI} !\.(css|jpg|gif|png|js)$ [NC]

That line says “do the stuff that follows only if the request doesn’t end in one of these file extensions. The [NC] flag makes the match case-insensitive.

The solution to #2 is to not use relative URLs in pages that are going to be subjected to rewriting of their URLs. The way that embedded files are loaded is that the browser finds those embedded URLs and resolves them into a full URL based on the URL of the page itself. Thus, if you have a page at a (rewritten) URL of http://example.com/one/two and an embedded image with src=”images/bunny.gif” then the browser will interpret that as http://example.com/one/bunny.gif and make a request accordingly. That request will probably fail. You should, instead, use a fully-qualified URL for these resources – that is, either starting with a leading ‘/’, or containing the entire URL, so that the browser requests the correct resource.

Look, an iPhone

Today I encountered a site that purported to look for iPhone user agents, and send them to a separate site. As is often the case, I’m very reluctant to even show it to you, because it’s so awful, but, for the sake of academic integrity, here it is.

What’s interesting about the recipe is how it could even work, for anyone, ever.

Where to start? Perhaps we could start this time with a working recipe. The purpose of this recipe, as stated above, is to redirect every request from an iPhone to a particular place. The important parts of that sentence are “every request” and “from an iPhone”. There’s also some twiddling about that they do to say “and not these other user agents”, which seems unnecessary, but perhaps there was a reason.

# This enables rewriting in this directory
RewriteEngine On

# Catch iPhone-users first, easiest to discover
RewriteCond %{HTTP_USER_AGENT} Mobile.+Safari
RewriteRule ^ http://beta.mydomain.no/iphone [R,L]

# Catch most familiar web browsers and redirect to web version,
# except Opera Mini and SymbianOS (which identifies itself as Safari)
RewriteCond %{HTTP_USER_AGENT} (MSIE.+Windows\ NT|Lynx|Safari|Opera|Firefox|Konqueror)
RewriteCond %{HTTP_USER_AGENT} !(Opera\ Mini|SymbianOS)
RewriteRule ^ http://beta.mydomain.no/web [R,L]

# Browsers that match neither block, such as regular screen
# browsers, could be caught by a final rewrite rule placed here,
# or we could leave it out and have nothing happen to the
# requested URL. This is the default.

RewriteRule ^ http://beta.mydomain.no/mobile [R,L]

Now, the first thing that you should notice different between this version and the original is that there’s about a million fewer .* and () floating around. They were accomplishing nothing, and make the entire thing a lot harder to read. Perhaps this makes it look more like magic, but that’s not actually a benefit. It seems that when people are first introduced to regular expressions, they feel the need to pepper .* all over everything, and throw () around everything else. This is completely unnecessary, and slows things down considerably.

The other main difference is the replacement of the regular expression “^[\./](.*)$” with “^”

I have no idea what was intended by that regular expression. I think it means “starts with a dot or a slash and is followed by other stuff”, but I can’t guess why that’s desirable. Anyways, it doesn’t actually mean that, because special characters aren’t actually special in character sets, so that \ is a literal “\”. Probably not what was intended.

My regex, “^”, matches all URLs, as desired. ^ matches the beginning of a string, and all strings have a beginning, even empty strings.

The moral of all of this is that if a rewrite rule set is completely unintelligible, it’s probably not right. Yes, regular expressions can be complicated, but the thing is, most of the time they’re not, or, at least, don’t need to be.

Oh, one final note. I added a [R] to the end of each of the rewrite rules. This isn’t strictly necessary, but I think it’s a good idea. Not for Apache’s sake, but for yours, to remind you explicitly that this is a Redirect, and not something else.

Seen Online: Removing index.php from the URL

I came across the following howto on the web the other day, and was amazed at just how many ways one could get such a simple thing wrong. It serves as a great example of how not to do things, while at the same time providing opportunity to show you how you can do things better.

First, here’s the article: http://www.programmingfacts.com/2009/12/24/how-to-remove-index-php-from-url-using-htaccess-mod_rewrite/

I sincerely hope that parts of it will be updated at some point, so don’t be too surprised if what you read there doesn’t seem to line up with my remarks about it. Also, it may seem that I’m being unduly harsh to Mr. Patel, and perhaps I am. But I see stuff like this every day, and Mr. Patel is taking all of the heat for those other articles too. My apologies.

So, let’s start with his recipe:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /([^/]+/)*index.php HTTP/
RewriteRule ^(([^/]+/)*)index.php$ http://www.%{HTTP_HOST}/ [R=301,NS,L]

The concepts here are good enough – it distinguishes between the initial HTTP request from the browser – %{THE_REQUEST} – and the actual URI that ends up being considered for mapping. That way, you can force THE_REQUEST to be one thing, but map it to another. So far, so good.

Let’s look at the RewriteCond.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9} /([^/]+/)*index.php HTTP/

This RewriteCond ignores one of the most important rules of regular expressions – a regular expression is a substring match, unless you force it not to be. So if all you care about is that THE_REQUEST is for something ending in index.php, most of that stuff is unnecessary. Now, this may seem to be picking nits, but on a website that received thousands or millions of requests per minute, every second that you spend in evaluating unnecessary regex bits is wasted time.

Remember that THE_REQUEST is the entire HTTP request string – something that looks like

GET /docs/images/feather.gif HTTP/1.1

And so the regex presented here attempts to match the entire HTTP method (GET or POST or a bunch of other things), hence the {3,9}. But we really don’t care, so that’s wasted time. It also tries to match the entire URI path, and even captures it into backreferences which it then discards. Wasted time *and* memory.

Finally, please note that the RewriteCond won’t in fact actually work at all, because it contains spaces, and yet is not enclosed in quotes. mod_rewrite will interpret that space as the termination of the regex, and will issue an “invalid flag delimiter” warning, because it will attempt to interpret that next bit as a flag, such as [NC] or [OR].

Instead, we really only need this:

RewriteCond %{THE_REQUEST} “/index.php HTTP”

Or, perhaps, if you want to be even more minimalist, and perhaps less clear to read:

RewriteCond %{THE_REQUEST} “index.php H”

But then we’re trading performance for readability, so you’ll have to make a judgment call on that.

Next, the RewriteRule:

RewriteRule ^(([^/]+/)*)index.php$ http://www.%{HTTP_HOST}/ [R=301,NS,L]

Remember that the goal is to redirect to a URL that still works, but which lacks the ‘index.php’ on the end of it, in the mistaken belief that this will improve your search engine ranking. (It won’t but that’s an article for another day.) However, this rewrite rule not only doesn’t do that, but very likely redirects to the wrong hostname entirely. It’s pretty clear that this rule was never tested, since it won’t work.

First of all, although it captures the leading path (ie, the bit before /index.php) so that it can redirect to the correct path (such as /application/index.php or /wordpress/index.php – but without the “index.php”), it then discards this, instead of using it in the redirection URL.

Secondly, it seems to assume that HTTP_HOST is lacking the ‘www’ prefix, which it may or may not be. So you may end up redirecting from www.example.com to www.www.example.com, which likely won’t work.

What it does get right is the use of the NS flag, so that it doesn’t enter a redirection loop on subrequest – mod_dir will map “/” back to “/index.php” in such a subrequest.

In the RewriteCond, we have already determined that the request is something ending in index.php, so we really don’t need to go to any trouble at all to craft a complicated regex to re-verify this. Instead, we only have to capture the bit that comes before index.php. Here’s what we need:

RewriteRule ^(.*)index.php$ http://%{HTTP_HOST}/$1 [R=301,NS,L]

Note that Mr. Patel has assumed that we’re doing all of this in a .htaccess file – something which I object to on principle, but will let go for now. So we can’t assume that there will be a leading slash on the REQUEST_URI as there would be in server config scope. If you use these rules in your main config, you’ll need to tweak accordingly.

But we capture the leading part of the request, if any, in $1, which we then use that in the redirection.

If the path exists, it will contain a trailing space, which will be stuck on the end of the redirection URL. If it doesn’t, well, we’ve already put a slash on there, so it accomplishes the same end.

I know I’ve been very long-winded here, but I wanted to demonstrate the dangers of these kinds of articles. Someone posts nonsense, and it gets re-tweeted a dozen times, and suddenly folks think that it’s their fault that they can’t get it working.

So, the full recipe, which will actually work:

RewriteEngine On
RewriteCond %{THE_REQUEST} “/index.php HTTP”
RewriteRule ^(.*)index.php$ http://%{HTTP_HOST}/$1 [R=301,NS,L]

And another day we’ll discuss the fallacy that removing index.php from your URLs actually helps your search engine ranking. It doesn’t, but I suppose people like to feel that they’re at least doing something.

Canonical hostnames

In this article, we consider the subject of canonical hostnames. There are a variety of reasons for wanting a canonical hostname. The most compelling of these is that cookies (can) break if you try to use them across multiple hostnames, so it’s a good idea to force a particular hostname from the first entry into the site.

The term “canonical hostname” means that we want to force everyone to use one preferred hostname, no matter what hostname they used to get to our site.

There are two ways to accomplish this.

The first of them is what will likely be used if you have a traditional virtual host setup, and want to do this redirection within the virtual host configuration. Here we appeal once again to the rule of thumb – don’t use mod_rewrite until you run out of other options. The other options are always more efficient and less prone to error.

In your virtualhost configuration, you’ll have two virtual hosts – one for the canonical hostname, and the other for any other valid hostnames that we’d like to redirect:
<VirtualHost *:80>
ServerName example.com
ServerAlias one.example.com two.example.com three.example.com
Redirect / http://www.example.com/
# ...
</VirtualHost *:80>

<VirtualHost *:80>
# Canonical hostname
ServerName www.example.com
# ...
</VirtualHost *:80>

The Redirect directive redirects sub-paths. That is, “Redirect /” will also redirect /one and /one/two  So the above configuration will redirect ANY URL that goes to one.example.com to www.example.com

Now, some folks will want to do this with mod_rewrite, for various reasons. The most common of which is that they don’t have access to the server configuration file, and so need to use a .htaccess file.

To do the same thing with mod_rewrite, place the following in your .htaccess file in your server document directory:

RewriteEngine On
RewriteCond %{HTTP_HOST} !=www.example.com
RewriteRule ^ http://www.example.com%{REQUEST_URI} [R,L]

A few remarks about this ruleset:

1) If you want to specify a particular rewrite code, such as 301 or 302, you can put that after the R: [R=301,L]

2) The RewriteCond says “if the requested host is NOT www.example.com, run the following rule.”

Why aren’t my rules doing anything?

If you’re using .htaccess files for your rewrite rules, and nothing at all appears to be happening, chances are pretty good that the problem isn’t with your rules, but with your .htaccess file.

But first, a brief word about .htaccess files.

The old myth was that you were required to put authentication directives in .htaccess files. After all, the name has ‘access’ in it, right? The new myth appears to be that you are required to put rewrite directives in .htaccess files. This is just as untrue as the old myth.

.htaccess files are for people who don’t have write permission to the server configuration file (often called httpd.conf, but it may be called something else). Now, if that’s you, great, use .htaccess files. But if you do have write permission to the server configuration file, you should put your rules there. It’s more efficient. It’s easier to debug. It’s less restrictive. It’s more secure.

Ok, back to the subject at hand.

The very first thing to check, when you have rewrite directives in a .htaccess that appear to be doing nothing at all, is to verify that you have .htaccess files enabled. This is very easy to do. Add a single bogus directive to your .htaccess file, and see if it results in a 500 Server Error when you browse to that location in your browser. So, for example, add the following to your .htaccess file:

BogusDirective Here

Yes, that literal line.

If browsing to that directory does not result in a server error, this is proof that you don’t have .htaccess files enabled.

To enable .htaccess files, you’ll need to set the AllowOverride directive appropriately. By default, it’s set to ‘none’ which means “don’t allow overrides.” Or, to put it differently, “AllowOverride None” means “please ignore my .htaccess files.

In your server configuration file, you’ll see the AllowOverride directive several places. Make sure that you set the right one. In particular, DO NOT change the one that appears in a <Directory /> block, because this block refers to your file system root directory, not your website root directory. Look instead for a block that points to your document root directory. This will look, perhaps, like <Directory /var/www> or maybe <Directory /usr/local/apache/htdocs>. Within that block, you’ll find an AllowOverride directive which is set to None.

Now, exactly what you should set it to is a matter of some debate. Some folks will tell you to just set it to All and be done with it. Some will tell you just to set it to FileInfo so that only rewrite (and related) directives are permitted. There’s no single right answer to this. You need to answer for yourself the question, how much do I trust my users, and set it appropriately. You can see all the possible values in the AllowOverride documentation. I would recommend:

AllowOverride AuthConfig FileInfo Indexes

Once again, a reminder that if you do in fact have write access to the server configuration file, you should consider putting your rules there instead of messing about with .htaccess files.

Related links:

Seen Online: A deeper look …

Just got done looking through Joseph Pecoraro’s new mod_rewrite overview. I’d rate this a ‘very good’. Good job, Joseph. Just a few comments about what you’ve written.

  1. You seem to have some typesetting problems, with mod_rewrite appearing as mod\_rewrite a few places.
  2. Yes, the voodoo remark is a huge turn-off. Please note that it has been removed in the latest version of the documentation, at http://httpd.apache.org/docs/trunk/rewrite. I’ve never been a fan of those quotes, and I’ve finally gotten rid of them, because they scare people off unnecessarily
  3. In Apache 2.2, one should use ‘httpd -M’ rather than ‘apachectl -t -D DUMP_MODULES’ -M has the advantage that it lists both static and shared modules, rather than just the static ones.
  4. I have to disagree with your remark about <IfModule>. I tend to think that <IfModule> should be avoided, because it hides problems. If the module is not loaded, you get a silent failure and you’re not quite sure what happened. If it fails, you want to know about it.
  5. Yes, the AddModule directive went away in Apache 2.0. You used to have to use AddModule and LoadModule for every module that you wanted to use. This was annoying, so the two directives were consolidated into one.

Rewrite Everything

A common request on various forums and IRC is to have all requests rewrite to a single handler that will process all of your requests. This kind of setup is often called a front controller, and is used by a number of web applications that have a single dispatcher, such as index.php, or dispatcher.rb, which is responsible for parsing the requested URI and mapping the requests to various sub-handlers.

The difficulty with this is not so much the rewriting everything, but ensuring that files which are actual static files on disk – such as images, css, js, and legacy HTML content – are served correctly, and not mapped to the dispatcher.

This is where the -d and -f flags come in. They check whether a requested resource is a directory (-d) or a file (-f) by looking on the disk for the resource.

So, your rewrite rules will look something like this:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/pony
RewriteBase /
RewriteRule . index.php [PT]

Here, I’m assuming that this is placed in a .htaccess file in the root directory of your site. You may need to tinker with the RewriteBase directive if this isn’t the case, in order to match mod_rewrite’s notion of the current directory with the actual path to the current directory. For example, if this is in a /blog subdirectory on your site, you should set RewriteBase to /blog in order that the -f and -d flags are looking in the right places.

What’s up with the “pony” line? I’m glad you asked.

Sometimes you’ll have content on your website that you’ll need to explicitly exclude from the RewriteRules. This will be when you’ve got an Alias, or some other way that you’re mapping URL space to content that doesn’t involve the filesystem. You’ll need to explicitly tell mod_rewrite to ignore that URL when considering what to rewrite.

One final note. In Apache 2.4, there’s a way to do all of this with a single directive – the FallbackResource directive. You’ll do the above rewrite block with:

FallbackResource /index.php

With any luck, 2.4 will get released within the next few months, and we won’t have to mess with RewriteRule at all for this rather simple scenario.

Related links:

What we’re doing here

Over the course of the last few years, mod_rewrite has become synonymous with SEO. This is unfortunate, for a number of reasons. One is that the folks who believe this equation seem to be under the mistaken impression that URLs, and only URLs, are responsible for their sites’ placement on search engines. And another is that they write countless inefficient and incorrect tutorials on how to transform their poorly designed URLs into “friendly” or “clean” URLs, for the benefit of these poorly-understood search engines.

We’re starting this website with several goals in mind:

1) Provide correct, efficient recipes for performing common tasks with mod_rewrite.

2) Point out some of the errors that are being made in other articles, and teach you to recognize them yourselves, and, ideally, raise the overall quality of these sorts of articles.

3) Dispel some of the myths that are growing up around the use of mod_rewrite.

So, hopefully you’ll start seeing articles here on the rate of about one per week. Meanwhile, it’s worth pointing out that we’ve recently done a major overhaul of the official mod_rewrite docs, and I encourage you to take a look at that over the coming months as we continue that process.