mod_rewrite

Introduction and Background

The mod_rewrite function of a web page server lets you change the identity of a requested file before it is retrieved from disk and served up to a client browser. This can offer convenience, security, and flexibility.

mod_rewrite helped me solve a problem when I moved web pages from a "NASA" subdirectory to a "nasa" subdirectory. Originally, I simply included an HTTP Redirect in the antiquated NASA directory index.htm file in the meta header, because I was able to keep both directories under Linux/Apache (Linux distinguished between upper and lower case names):

<meta http-equiv="REFRESH" content="7;url=http://www.increa.com/nasa2000/">

Then I copied the web pages down onto my Linux box for backup. No problem. Then I backed up my home directory to a Windows computer ==> problem! Windows does not distinguish between upper and lower case. I needed a different way to direct a request for NASA to nasa. I didn't want to abandon all the inbound links to the upper-case web page, so I looked into other ways of forwarding without retaining the upper-case directory.

Solution Examples - Research

First, some resources:


In my situation, after the web server turns your URL into a local directory to retrieve whatever file you've asked for, it processes the path name (to include the file name) with the directives in .htaccess files — starting from the root web directory all the way down to whatever directory your requested file lives in.

Definition of HTML protocol includes status codes the server can send back to your web browser with the Redirect (R) flag. I'm interested in Status Code R=301 "page has permanently moved" (in other words the browser will not re-query the same URL, but instead accept the training to only ask for the new one).

In addition to announcing the page permanently moved, the server needs to send the replacement URL when using the R flag. Apache's link above shows examples using R without providing the "http:" part. The Beginner's Guide linked above says you have to provide the "http:" part when R is used.

A typical .htaccess file is shown here to introduce the 3 common Rewrite commands. This file prevents others from linking to my pictures and using up my bandwidth for their web page displays. It forces a small blank replacement picture instead.
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?increa.com/.*$ [NC]
RewriteRule \.(gif|jpg|png)$ http://www.increa.com/replacement.gif [R,L]

In this example, after turning the engine on, if all of the RewriteConditions are true,...
  1. Referrer is not null, ^$.
  2. Referrer is not my own site, with or without the leading "www".
...then the RewriteRule occurs: If there is a "." somewhere followed by a graphic filetype in the request, a single replacement graphic file is sent instead.

Another mod_write sequence that's useful is to redirect a not-yet-populated subdomain to another subdirectory under the main domain. In other words turn wiki.increa.com into increa.com/wiki.
RewriteEngine on
RewriteCond %{HTTP_HOST} ^wiki.increa.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.wiki.increa.com$
RewriteRule ^(.*)$ http://www.increa.com/$1 [R,L]


Prior to checking any .htaccess file in a given directory, the .htaccess file is checked in all higher-level directories by crawling from the web root downward. In my web root, these directives determine if the desired URL was the cause of the request, and if so, then the URL is rewritten to go to the alternate directory.

Notice after the server recreates a new URL, it is resubmitted to itself, so the conditional checks prevent an infinite loop from occurring. In other words, the RewriteCond are required to work only with "wiki.increa.com" or "www.wiki.increa.com", but not create an infinite loop with "www.increa.com".

Solution for me

(later update:  both pages are replaced with http://www.increa.com/astronaut-job-interview)

I'd like to turn /home/public_html/NASA2000/ into http://www.increa.com/nasa2000/. Here's the .htaccess directives to accomplish that task:
RewriteEngine on
RewriteRule ^.*NASA(.*)$ http://www.increa.com/nasa$1 [R=301,L]

The regex expression says:
Starting from the start (^), accept any characters any number of times prior to "NASA".
After NASA, find any characters any number of times, and remember them all the way to the end ($). Then rebuild a new URL with the base domain name, and add "nasa" on the end, and then stick all the following stuff back on the end.

So for example,

/home/public_html/NASA2000/image001.jpg

turns into

http://www.increa.com/nasa2000/image001.jpg

Scope (in addition to syntax)

This works great if the .htaccess is put in the directory above the NASA directories. If I include the .htaccess file in both places, the substitution fails - I haven't figured out why. The server just returns whatever it finds in the NASA directory (index.htm, image001.jpg, or whatever).

<html>
<head>
<title>Second mod_rewrite example</title>
</head>
<body>
<p>
The requested page was:
<?php echo $_GET['page']; ?>
</p>
</body>
</html>


Created by brian. Last Modification: Saturday 18 of December, 2010 01:32:41 CST by admin.