1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
4<html xmlns="http://www.w3.org/1999/xhtml">
5  <head>
6    <meta name="generator" content="HTML Tidy, see www.w3.org" />
7
8    <title>Apache 1.3 URL Rewriting Guide</title>
9  </head>
10  <!-- Background white, links blue (unvisited), navy (visited), red (active) -->
11
12  <body bgcolor="#FFFFFF" text="#000000" link="#0000FF"
13  vlink="#000080" alink="#FF0000">
14    <blockquote>
15          <div align="CENTER">
16      <img src="../images/sub.gif" alt="[APACHE DOCUMENTATION]" />
17
18      <h3>Apache HTTP Server Version 1.3</h3>
19    </div>
20
21
22      <div align="CENTER">
23        <h1>Apache 1.3<br />
24         URL Rewriting Guide<br />
25        </h1>
26
27        <address>
28          Originally written by<br />
29           Ralf S. Engelschall &lt;rse@apache.org&gt;<br />
30           December 1997
31        </address>
32      </div>
33
34      <p>This document supplements the mod_rewrite <a
35      href="../mod/mod_rewrite.html">reference documentation</a>.
36      It describes how one can use Apache's mod_rewrite to solve
37      typical URL-based problems with which webmasters are often
38      confronted. We give detailed descriptions on how to
39      solve each problem by configuring URL rewriting rulesets.</p>
40
41      <h2><a id="ToC1" name="ToC1">Introduction to
42      mod_rewrite</a></h2>
43      The Apache module mod_rewrite is a killer one, i.e. it is a
44      really sophisticated module which provides a powerful way to
45      do URL manipulations. With it you can do nearly all types of
46      URL manipulations you ever dreamed about. The price you have
47      to pay is to accept complexity, because mod_rewrite's major
48      drawback is that it is not easy to understand and use for the
49      beginner. And even Apache experts sometimes discover new
50      aspects where mod_rewrite can help.
51
52      <p>In other words: With mod_rewrite you either shoot yourself
53      in the foot the first time and never use it again or love it
54      for the rest of your life because of its power. This paper
55      tries to give you a few initial success events to avoid the
56      first case by presenting already invented solutions to
57      you.</p>
58
59      <h2><a id="ToC2" name="ToC2">Practical Solutions</a></h2>
60      Here come a lot of practical solutions I've either invented
61      myself or collected from other peoples solutions in the past.
62      Feel free to learn the black magic of URL rewriting from
63      these examples.
64
65      <table bgcolor="#FFE0E0" border="0" cellspacing="0"
66      cellpadding="5">
67        <tr>
68          <td>ATTENTION: Depending on your server-configuration it
69          can be necessary to slightly change the examples for your
70          situation, e.g. adding the [PT] flag when additionally
71          using mod_alias and mod_userdir, etc. Or rewriting a
72          ruleset to fit in <code>.htaccess</code> context instead
73          of per-server context. Always try to understand what a
74          particular ruleset really does before you use it in order to
75          avoid problems.</td>
76        </tr>
77      </table>
78
79      <h1>URL Layout</h1>
80
81      <h2>Canonical URLs</h2>
82
83      <dl>
84        <dt><strong>Description:</strong></dt>
85
86        <dd>On some webservers there are more than one URL for a
87        resource. Usually there are canonical URLs (which should be
88        actually used and distributed) and those which are just
89        shortcuts, internal ones, etc. Independent which URL the
90        user supplied with the request he should finally see the
91        canonical one only.</dd>
92
93        <dt><strong>Solution:</strong></dt>
94
95        <dd>
96          We do an external HTTP redirect for all non-canonical
97          URLs to fix them in the location view of the Browser and
98          for all subsequent requests. In the example ruleset below
99          we replace <code>/~user</code> by the canonical
100          <code>/u/user</code> and fix a missing trailing slash for
101          <code>/u/user</code>.
102
103          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
104          cellpadding="5">
105            <tr>
106              <td>
107<pre>
108RewriteRule   ^/<strong>~</strong>([^/]+)/?(.*)    /<strong>u</strong>/$1/$2  [<strong>R</strong>]
109RewriteRule   ^/([uge])/(<strong>[^/]+</strong>)$  /$1/$2<strong>/</strong>   [<strong>R</strong>]
110</pre>
111              </td>
112            </tr>
113          </table>
114        </dd>
115      </dl>
116
117      <h2>Canonical Hostnames</h2>
118
119      <dl>
120        <dt><strong>Description:</strong></dt>
121
122        <dd>The goal of this rule is to force the use of a particular
123        hostname, in preference to other hostnames which may be used to
124        reach the same site. For example, if you wish to force the use
125        of <strong>www.example.com</strong> instead of
126        <strong>example.com</strong>, you might use a variant of the
127        following recipe.</dd>
128
129        <dt><strong>Solution:</strong></dt>
130
131        <dd>
132          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
133          cellpadding="5">
134            <tr>
135              <td>
136<pre>
137# For sites running on a port other than 80
138RewriteCond %{HTTP_HOST}   !^fully\.qualified\.domain\.name [NC]
139RewriteCond %{HTTP_HOST}   !^$
140RewriteCond %{SERVER_PORT} !^80$
141RewriteRule ^/(.*)         http://fully.qualified.domain.name:%{SERVER_PORT}/$1 [L,R]
142
143# And for a site running on port 80
144RewriteCond %{HTTP_HOST}   !^fully\.qualified\.domain\.name [NC]
145RewriteCond %{HTTP_HOST}   !^$
146RewriteRule ^/(.*)         http://fully.qualified.domain.name/$1 [L,R]
147</pre>
148              </td>
149            </tr>
150          </table>
151        </dd>
152      </dl>
153
154      <h2>Moved DocumentRoot</h2>
155
156      <dl>
157        <dt><strong>Description:</strong></dt>
158
159        <dd>Usually the DocumentRoot of the webserver directly
160        relates to the URL ``<code>/</code>''. But often this data
161        is not really of top-level priority, it is perhaps just one
162        entity of a lot of data pools. For instance at our Intranet
163        sites there are <code>/e/www/</code> (the homepage for
164        WWW), <code>/e/sww/</code> (the homepage for the Intranet)
165        etc. Now because the data of the DocumentRoot stays at
166        <code>/e/www/</code> we had to make sure that all inlined
167        images and other stuff inside this data pool work for
168        subsequent requests.</dd>
169
170        <dt><strong>Solution:</strong></dt>
171
172        <dd>
173          We just redirect the URL <code>/</code> to
174          <code>/e/www/</code>. While is seems trivial it is
175          actually trivial with mod_rewrite, only. Because the
176          typical old mechanisms of URL <em>Aliases</em> (as
177          provides by mod_alias and friends) only used
178          <em>prefix</em> matching. With this you cannot do such a
179          redirection because the DocumentRoot is a prefix of all
180          URLs. With mod_rewrite it is really trivial:
181
182          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
183          cellpadding="5">
184            <tr>
185              <td>
186<pre>
187RewriteEngine on
188RewriteRule   <strong>^/$</strong>  /e/www/  [<strong>R</strong>]
189</pre>
190              </td>
191            </tr>
192          </table>
193        </dd>
194      </dl>
195
196      <h2>Trailing Slash Problem</h2>
197
198      <dl>
199        <dt><strong>Description:</strong></dt>
200
201        <dd>Every webmaster can sing a song about the problem of
202        the trailing slash on URLs referencing directories. If they
203        are missing, the server dumps an error, because if you say
204        <code>/~quux/foo</code> instead of <code>/~quux/foo/</code>
205        then the server searches for a <em>file</em> named
206        <code>foo</code>. And because this file is a directory it
207        complains. Actually is tries to fix it themself in most of
208        the cases, but sometimes this mechanism need to be emulated
209        by you. For instance after you have done a lot of
210        complicated URL rewritings to CGI scripts etc.</dd>
211
212        <dt><strong>Solution:</strong></dt>
213
214        <dd>
215          The solution to this subtle problem is to let the server
216          add the trailing slash automatically. To do this
217          correctly we have to use an external redirect, so the
218          browser correctly requests subsequent images etc. If we
219          only did a internal rewrite, this would only work for the
220          directory page, but would go wrong when any images are
221          included into this page with relative URLs, because the
222          browser would request an in-lined object. For instance, a
223          request for <code>image.gif</code> in
224          <code>/~quux/foo/index.html</code> would become
225          <code>/~quux/image.gif</code> without the external
226          redirect!
227
228          <p>So, to do this trick we write:</p>
229
230          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
231          cellpadding="5">
232            <tr>
233              <td>
234<pre>
235RewriteEngine  on
236RewriteBase    /~quux/
237RewriteRule    ^foo<strong>$</strong>  foo<strong>/</strong>  [<strong>R</strong>]
238</pre>
239              </td>
240            </tr>
241          </table>
242
243          <p>The crazy and lazy can even do the following in the
244          top-level <code>.htaccess</code> file of their homedir.
245          But notice that this creates some processing
246          overhead.</p>
247
248          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
249          cellpadding="5">
250            <tr>
251              <td>
252<pre>
253RewriteEngine  on
254RewriteBase    /~quux/
255RewriteCond    %{REQUEST_FILENAME}  <strong>-d</strong>
256RewriteRule    ^(.+<strong>[^/]</strong>)$           $1<strong>/</strong>  [R]
257</pre>
258              </td>
259            </tr>
260          </table>
261        </dd>
262      </dl>
263
264      <h2>Webcluster through Homogeneous URL Layout</h2>
265
266      <dl>
267        <dt><strong>Description:</strong></dt>
268
269        <dd>We want to create a homogenous and consistent URL
270        layout over all WWW servers on a Intranet webcluster, i.e.
271        all URLs (per definition server local and thus server
272        dependent!) become actually server <em>independed</em>!
273        What we want is to give the WWW namespace a consistent
274        server-independend layout: no URL should have to include
275        any physically correct target server. The cluster itself
276        should drive us automatically to the physical target
277        host.</dd>
278
279        <dt><strong>Solution:</strong></dt>
280
281        <dd>
282          First, the knowledge of the target servers come from
283          (distributed) external maps which contain information
284          where our users, groups and entities stay. The have the
285          form
286<pre>
287user1  server_of_user1
288user2  server_of_user2
289:      :
290</pre>
291
292          <p>We put them into files <code>map.xxx-to-host</code>.
293          Second we need to instruct all servers to redirect URLs
294          of the forms</p>
295<pre>
296/u/user/anypath
297/g/group/anypath
298/e/entity/anypath
299</pre>
300
301          <p>to</p>
302<pre>
303http://physical-host/u/user/anypath
304http://physical-host/g/group/anypath
305http://physical-host/e/entity/anypath
306</pre>
307
308          <p>when the URL is not locally valid to a server. The
309          following ruleset does this for us by the help of the map
310          files (assuming that server0 is a default server which
311          will be used if a user has no entry in the map):</p>
312
313          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
314          cellpadding="5">
315            <tr>
316              <td>
317<pre>
318RewriteEngine on
319
320RewriteMap      user-to-host   txt:/path/to/map.user-to-host
321RewriteMap     group-to-host   txt:/path/to/map.group-to-host
322RewriteMap    entity-to-host   txt:/path/to/map.entity-to-host
323
324RewriteRule   ^/u/<strong>([^/]+)</strong>/?(.*)   http://<strong>${user-to-host:$1|server0}</strong>/u/$1/$2
325RewriteRule   ^/g/<strong>([^/]+)</strong>/?(.*)  http://<strong>${group-to-host:$1|server0}</strong>/g/$1/$2
326RewriteRule   ^/e/<strong>([^/]+)</strong>/?(.*) http://<strong>${entity-to-host:$1|server0}</strong>/e/$1/$2
327
328RewriteRule   ^/([uge])/([^/]+)/?$          /$1/$2/.www/
329RewriteRule   ^/([uge])/([^/]+)/([^.]+.+)   /$1/$2/.www/$3\
330</pre>
331              </td>
332            </tr>
333          </table>
334        </dd>
335      </dl>
336
337      <h2>Move Homedirs to Different Webserver</h2>
338
339      <dl>
340        <dt><strong>Description:</strong></dt>
341
342        <dd>A lot of webmaster aksed for a solution to the
343        following situation: They wanted to redirect just all
344        homedirs on a webserver to another webserver. They usually
345        need such things when establishing a newer webserver which
346        will replace the old one over time.</dd>
347
348        <dt><strong>Solution:</strong></dt>
349
350        <dd>
351          The solution is trivial with mod_rewrite. On the old
352          webserver we just redirect all
353          <code>/~user/anypath</code> URLs to
354          <code>http://newserver/~user/anypath</code>.
355
356          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
357          cellpadding="5">
358            <tr>
359              <td>
360<pre>
361RewriteEngine on
362RewriteRule   ^/~(.+)  http://<strong>newserver</strong>/~$1  [R,L]
363</pre>
364              </td>
365            </tr>
366          </table>
367        </dd>
368      </dl>
369
370      <h2>Structured Homedirs</h2>
371
372      <dl>
373        <dt><strong>Description:</strong></dt>
374
375        <dd>Some sites with thousend of users usually use a
376        structured homedir layout, i.e. each homedir is in a
377        subdirectory which begins for instance with the first
378        character of the username. So, <code>/~foo/anypath</code>
379        is <code>/home/<strong>f</strong>/foo/.www/anypath</code>
380        while <code>/~bar/anypath</code> is
381        <code>/home/<strong>b</strong>/bar/.www/anypath</code>.</dd>
382
383        <dt><strong>Solution:</strong></dt>
384
385        <dd>
386          We use the following ruleset to expand the tilde URLs
387          into exactly the above layout.
388
389          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
390          cellpadding="5">
391            <tr>
392              <td>
393<pre>
394RewriteEngine on
395RewriteRule   ^/~(<strong>([a-z])</strong>[a-z0-9]+)(.*)  /home/<strong>$2</strong>/$1/.www$3
396</pre>
397              </td>
398            </tr>
399          </table>
400        </dd>
401      </dl>
402
403      <h2>Filesystem Reorganisation</h2>
404
405      <dl>
406        <dt><strong>Description:</strong></dt>
407
408        <dd>
409          This really is a hardcore example: a killer application
410          which heavily uses per-directory
411          <code>RewriteRules</code> to get a smooth look and feel
412          on the Web while its data structure is never touched or
413          adjusted. Background: <strong><em>net.sw</em></strong> is
414          my archive of freely available Unix software packages,
415          which I started to collect in 1992. It is both my hobby
416          and job to to this, because while I'm studying computer
417          science I have also worked for many years as a system and
418          network administrator in my spare time. Every week I need
419          some sort of software so I created a deep hierarchy of
420          directories where I stored the packages:
421<pre>
422drwxrwxr-x   2 netsw  users    512 Aug  3 18:39 Audio/
423drwxrwxr-x   2 netsw  users    512 Jul  9 14:37 Benchmark/
424drwxrwxr-x  12 netsw  users    512 Jul  9 00:34 Crypto/
425drwxrwxr-x   5 netsw  users    512 Jul  9 00:41 Database/
426drwxrwxr-x   4 netsw  users    512 Jul 30 19:25 Dicts/
427drwxrwxr-x  10 netsw  users    512 Jul  9 01:54 Graphic/
428drwxrwxr-x   5 netsw  users    512 Jul  9 01:58 Hackers/
429drwxrwxr-x   8 netsw  users    512 Jul  9 03:19 InfoSys/
430drwxrwxr-x   3 netsw  users    512 Jul  9 03:21 Math/
431drwxrwxr-x   3 netsw  users    512 Jul  9 03:24 Misc/
432drwxrwxr-x   9 netsw  users    512 Aug  1 16:33 Network/
433drwxrwxr-x   2 netsw  users    512 Jul  9 05:53 Office/
434drwxrwxr-x   7 netsw  users    512 Jul  9 09:24 SoftEng/
435drwxrwxr-x   7 netsw  users    512 Jul  9 12:17 System/
436drwxrwxr-x  12 netsw  users    512 Aug  3 20:15 Typesetting/
437drwxrwxr-x  10 netsw  users    512 Jul  9 14:08 X11/
438</pre>
439
440          <p>In July 1996 I decided to make this archive public to
441          the world via a nice Web interface. "Nice" means that I
442          wanted to offer an interface where you can browse
443          directly through the archive hierarchy. And "nice" means
444          that I didn't wanted to change anything inside this
445          hierarchy - not even by putting some CGI scripts at the
446          top of it. Why? Because the above structure should be
447          later accessible via FTP as well, and I didn't want any
448          Web or CGI stuff to be there.</p>
449        </dd>
450
451        <dt><strong>Solution:</strong></dt>
452
453        <dd>
454          The solution has two parts: The first is a set of CGI
455          scripts which create all the pages at all directory
456          levels on-the-fly. I put them under
457          <code>/e/netsw/.www/</code> as follows:
458<pre>
459-rw-r--r--   1 netsw  users    1318 Aug  1 18:10 .wwwacl
460drwxr-xr-x  18 netsw  users     512 Aug  5 15:51 DATA/
461-rw-rw-rw-   1 netsw  users  372982 Aug  5 16:35 LOGFILE
462-rw-r--r--   1 netsw  users     659 Aug  4 09:27 TODO
463-rw-r--r--   1 netsw  users    5697 Aug  1 18:01 netsw-about.html
464-rwxr-xr-x   1 netsw  users     579 Aug  2 10:33 netsw-access.pl
465-rwxr-xr-x   1 netsw  users    1532 Aug  1 17:35 netsw-changes.cgi
466-rwxr-xr-x   1 netsw  users    2866 Aug  5 14:49 netsw-home.cgi
467drwxr-xr-x   2 netsw  users     512 Jul  8 23:47 netsw-img/
468-rwxr-xr-x   1 netsw  users   24050 Aug  5 15:49 netsw-lsdir.cgi
469-rwxr-xr-x   1 netsw  users    1589 Aug  3 18:43 netsw-search.cgi
470-rwxr-xr-x   1 netsw  users    1885 Aug  1 17:41 netsw-tree.cgi
471-rw-r--r--   1 netsw  users     234 Jul 30 16:35 netsw-unlimit.lst
472</pre>
473
474          <p>The <code>DATA/</code> subdirectory holds the above
475          directory structure, i.e. the real
476          <strong><em>net.sw</em></strong> stuff and gets
477          automatically updated via <code>rdist</code> from time to
478          time. The second part of the problem remains: how to link
479          these two structures together into one smooth-looking URL
480          tree? We want to hide the <code>DATA/</code> directory
481          from the user while running the appropriate CGI scripts
482          for the various URLs. Here is the solution: first I put
483          the following into the per-directory configuration file
484          in the Document Root of the server to rewrite the
485          announced URL <code>/net.sw/</code> to the internal path
486          <code>/e/netsw</code>:</p>
487
488          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
489          cellpadding="5">
490            <tr>
491              <td>
492<pre>
493RewriteRule  ^net.sw$       net.sw/        [R]
494RewriteRule  ^net.sw/(.*)$  e/netsw/$1
495</pre>
496              </td>
497            </tr>
498          </table>
499
500          <p>The first rule is for requests which miss the trailing
501          slash! The second rule does the real thing. And then
502          comes the killer configuration which stays in the
503          per-directory config file
504          <code>/e/netsw/.www/.wwwacl</code>:</p>
505
506          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
507          cellpadding="5">
508            <tr>
509              <td>
510<pre>
511Options       ExecCGI FollowSymLinks Includes MultiViews
512
513RewriteEngine on
514
515#  we are reached via /net.sw/ prefix
516RewriteBase   /net.sw/
517
518#  first we rewrite the root dir to
519#  the handling cgi script
520RewriteRule   ^$                       netsw-home.cgi     [L]
521RewriteRule   ^index\.html$            netsw-home.cgi     [L]
522
523#  strip out the subdirs when
524#  the browser requests us from perdir pages
525RewriteRule   ^.+/(netsw-[^/]+/.+)$    $1                 [L]
526
527#  and now break the rewriting for local files
528RewriteRule   ^netsw-home\.cgi.*       -                  [L]
529RewriteRule   ^netsw-changes\.cgi.*    -                  [L]
530RewriteRule   ^netsw-search\.cgi.*     -                  [L]
531RewriteRule   ^netsw-tree\.cgi$        -                  [L]
532RewriteRule   ^netsw-about\.html$      -                  [L]
533RewriteRule   ^netsw-img/.*$           -                  [L]
534
535#  anything else is a subdir which gets handled
536#  by another cgi script
537RewriteRule   !^netsw-lsdir\.cgi.*     -                  [C]
538RewriteRule   (.*)                     netsw-lsdir.cgi/$1
539</pre>
540              </td>
541            </tr>
542          </table>
543
544          <p>Some hints for interpretation:</p>
545
546          <ol>
547            <li>Notice the L (last) flag and no substitution field
548            ('-') in the forth part</li>
549
550            <li>Notice the ! (not) character and the C (chain) flag
551            at the first rule in the last part</li>
552
553            <li>Notice the catch-all pattern in the last rule</li>
554          </ol>
555        </dd>
556      </dl>
557
558      <h2>NCSA imagemap to Apache mod_imap</h2>
559
560      <dl>
561        <dt><strong>Description:</strong></dt>
562
563        <dd>When switching from the NCSA webserver to the more
564        modern Apache webserver a lot of people want a smooth
565        transition. So they want pages which use their old NCSA
566        <code>imagemap</code> program to work under Apache with the
567        modern <code>mod_imap</code>. The problem is that there are
568        a lot of hyperlinks around which reference the
569        <code>imagemap</code> program via
570        <code>/cgi-bin/imagemap/path/to/page.map</code>. Under
571        Apache this has to read just
572        <code>/path/to/page.map</code>.</dd>
573
574        <dt><strong>Solution:</strong></dt>
575
576        <dd>
577          We use a global rule to remove the prefix on-the-fly for
578          all requests:
579
580          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
581          cellpadding="5">
582            <tr>
583              <td>
584<pre>
585RewriteEngine  on
586RewriteRule    ^/cgi-bin/imagemap(.*)  $1  [PT]
587</pre>
588              </td>
589            </tr>
590          </table>
591        </dd>
592      </dl>
593
594      <h2>Search pages in more than one directory</h2>
595
596      <dl>
597        <dt><strong>Description:</strong></dt>
598
599        <dd>Sometimes it is neccessary to let the webserver search
600        for pages in more than one directory. Here MultiViews or
601        other techniques cannot help.</dd>
602
603        <dt><strong>Solution:</strong></dt>
604
605        <dd>
606          We program a explicit ruleset which searches for the
607          files in the directories.
608
609          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
610          cellpadding="5">
611            <tr>
612              <td>
613<pre>
614RewriteEngine on
615
616#   first try to find it in custom/...
617#   ...and if found stop and be happy:
618RewriteCond         /your/docroot/<strong>dir1</strong>/%{REQUEST_FILENAME}  -f
619RewriteRule  ^(.+)  /your/docroot/<strong>dir1</strong>/$1  [L]
620
621#   second try to find it in pub/...
622#   ...and if found stop and be happy:
623RewriteCond         /your/docroot/<strong>dir2</strong>/%{REQUEST_FILENAME}  -f
624RewriteRule  ^(.+)  /your/docroot/<strong>dir2</strong>/$1  [L]
625
626#   else go on for other Alias or ScriptAlias directives,
627#   etc.
628RewriteRule   ^(.+)  -  [PT]
629</pre>
630              </td>
631            </tr>
632          </table>
633        </dd>
634      </dl>
635
636      <h2>Set Environment Variables According To URL Parts</h2>
637
638      <dl>
639        <dt><strong>Description:</strong></dt>
640
641        <dd>Perhaps you want to keep status information between
642        requests and use the URL to encode it. But you don't want
643        to use a CGI wrapper for all pages just to strip out this
644        information.</dd>
645
646        <dt><strong>Solution:</strong></dt>
647
648        <dd>
649          We use a rewrite rule to strip out the status information
650          and remember it via an environment variable which can be
651          later dereferenced from within XSSI or CGI. This way a
652          URL <code>/foo/S=java/bar/</code> gets translated to
653          <code>/foo/bar/</code> and the environment variable named
654          <code>STATUS</code> is set to the value "java".
655
656          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
657          cellpadding="5">
658            <tr>
659              <td>
660<pre>
661RewriteEngine on
662RewriteRule   ^(.*)/<strong>S=([^/]+)</strong>/(.*)    $1/$3 [E=<strong>STATUS:$2</strong>]
663</pre>
664              </td>
665            </tr>
666          </table>
667        </dd>
668      </dl>
669
670      <h2>Virtual User Hosts</h2>
671
672      <dl>
673        <dt><strong>Description:</strong></dt>
674
675        <dd>Assume that you want to provide
676        <code>www.<strong>username</strong>.host.domain.com</code>
677        for the homepage of username via just DNS A records to the
678        same machine and without any virtualhosts on this
679        machine.</dd>
680
681        <dt><strong>Solution:</strong></dt>
682
683        <dd>
684          For HTTP/1.0 requests there is no solution, but for
685          HTTP/1.1 requests which contain a Host: HTTP header we
686          can use the following ruleset to rewrite
687          <code>http://www.username.host.com/anypath</code>
688          internally to <code>/home/username/anypath</code>:
689
690          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
691          cellpadding="5">
692            <tr>
693              <td>
694<pre>
695RewriteEngine on
696RewriteCond   %{<strong>HTTP_HOST</strong>}                 ^www\.<strong>[^.]+</strong>\.host\.com$
697RewriteRule   ^(.+)                        %{HTTP_HOST}$1          [C]
698RewriteRule   ^www\.<strong>([^.]+)</strong>\.host\.com(.*) /home/<strong>$1</strong>$2
699</pre>
700              </td>
701            </tr>
702          </table>
703        </dd>
704      </dl>
705
706      <h2>Redirect Homedirs For Foreigners</h2>
707
708      <dl>
709        <dt><strong>Description:</strong></dt>
710
711        <dd>We want to redirect homedir URLs to another webserver
712        <code>www.somewhere.com</code> when the requesting user
713        does not stay in the local domain
714        <code>ourdomain.com</code>. This is sometimes used in
715        virtual host contexts.</dd>
716
717        <dt><strong>Solution:</strong></dt>
718
719        <dd>
720          Just a rewrite condition:
721
722          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
723          cellpadding="5">
724            <tr>
725              <td>
726<pre>
727RewriteEngine on
728RewriteCond   %{REMOTE_HOST}  <strong>!^.+\.ourdomain\.com$</strong>
729RewriteRule   ^(/~.+)         http://www.somewhere.com/$1 [R,L]
730</pre>
731              </td>
732            </tr>
733          </table>
734        </dd>
735      </dl>
736
737      <h2>Redirect Failing URLs To Other Webserver</h2>
738
739      <dl>
740        <dt><strong>Description:</strong></dt>
741
742        <dd>A typical FAQ about URL rewriting is how to redirect
743        failing requests on webserver A to webserver B. Usually
744        this is done via ErrorDocument CGI-scripts in Perl, but
745        there is also a mod_rewrite solution. But notice that this
746        is less performant than using a ErrorDocument
747        CGI-script!</dd>
748
749        <dt><strong>Solution:</strong></dt>
750
751        <dd>
752          The first solution has the best performance but less
753          flexibility and is less error safe:
754
755          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
756          cellpadding="5">
757            <tr>
758              <td>
759<pre>
760RewriteEngine on
761RewriteCond   /your/docroot/%{REQUEST_FILENAME} <strong>!-f</strong>
762RewriteRule   ^(.+)                             http://<strong>webserverB</strong>.dom/$1
763</pre>
764              </td>
765            </tr>
766          </table>
767
768          <p>The problem here is that this will only work for pages
769          inside the DocumentRoot. While you can add more
770          Conditions (for instance to also handle homedirs, etc.)
771          there is better variant:</p>
772
773          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
774          cellpadding="5">
775            <tr>
776              <td>
777<pre>
778RewriteEngine on
779RewriteCond   %{REQUEST_URI} <strong>!-U</strong>
780RewriteRule   ^(.+)          http://<strong>webserverB</strong>.dom/$1
781</pre>
782              </td>
783            </tr>
784          </table>
785
786          <p>This uses the URL look-ahead feature of mod_rewrite.
787          The result is that this will work for all types of URLs
788          and is a safe way. But it does a performance impact on
789          the webserver, because for every request there is one
790          more internal subrequest. So, if your webserver runs on a
791          powerful CPU, use this one. If it is a slow machine, use
792          the first approach or better a ErrorDocument
793          CGI-script.</p>
794        </dd>
795      </dl>
796
797      <h2>Extended Redirection</h2>
798
799      <dl>
800        <dt><strong>Description:</strong></dt>
801
802        <dd>Sometimes we need more control (concerning the
803        character escaping mechanism) of URLs on redirects. Usually
804        the Apache kernels URL escape function also escapes
805        anchors, i.e. URLs like "url#anchor". You cannot use this
806        directly on redirects with mod_rewrite because the
807        uri_escape() function of Apache would also escape the hash
808        character. How can we redirect to such a URL?</dd>
809
810        <dt><strong>Solution:</strong></dt>
811
812        <dd>
813          We have to use a kludge by the use of a NPH-CGI script
814          which does the redirect itself. Because here no escaping
815          is done (NPH=non-parseable headers). First we introduce a
816          new URL scheme <code>xredirect:</code> by the following
817          per-server config-line (should be one of the last rewrite
818          rules):
819
820          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
821          cellpadding="5">
822            <tr>
823              <td>
824<pre>
825RewriteRule ^xredirect:(.+) /path/to/nph-xredirect.cgi/$1 \
826            [T=application/x-httpd-cgi,L]
827</pre>
828              </td>
829            </tr>
830          </table>
831
832          <p>This forces all URLs prefixed with
833          <code>xredirect:</code> to be piped through the
834          <code>nph-xredirect.cgi</code> program. And this program
835          just looks like:</p>
836
837          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
838          cellpadding="5">
839            <tr>
840              <td>
841<pre>
842#!/path/to/perl
843##
844##  nph-xredirect.cgi -- NPH/CGI script for extended redirects
845##  Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
846##
847
848$| = 1;
849$url = $ENV{'PATH_INFO'};
850
851print "HTTP/1.0 302 Moved Temporarily\n";
852print "Server: $ENV{'SERVER_SOFTWARE'}\n";
853print "Location: $url\n";
854print "Content-type: text/html\n";
855print "\n";
856print "&lt;html&gt;\n";
857print "&lt;head&gt;\n";
858print "&lt;title&gt;302 Moved Temporarily (EXTENDED)&lt;/title&gt;\n";
859print "&lt;/head&gt;\n";
860print "&lt;body&gt;\n";
861print "&lt;h1&gt;Moved Temporarily (EXTENDED)&lt;/h1&gt;\n";
862print "The document has moved &lt;a HREF=\"$url\"&gt;here&lt;/a&gt;.&lt;p&gt;\n";
863print "&lt;/body&gt;\n";
864print "&lt;/html&gt;\n";
865
866##EOF##
867</pre>
868              </td>
869            </tr>
870          </table>
871
872          <p>This provides you with the functionality to do
873          redirects to all URL schemes, i.e. including the one
874          which are not directly accepted by mod_rewrite. For
875          instance you can now also redirect to
876          <code>news:newsgroup</code> via</p>
877
878          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
879          cellpadding="5">
880            <tr>
881              <td>
882<pre>
883RewriteRule ^anyurl  xredirect:news:newsgroup
884</pre>
885              </td>
886            </tr>
887          </table>
888
889          <p>Notice: You have not to put [R] or [R,L] to the above
890          rule because the <code>xredirect:</code> need to be
891          expanded later by our special "pipe through" rule
892          above.</p>
893        </dd>
894      </dl>
895
896      <h2>Archive Access Multiplexer</h2>
897
898      <dl>
899        <dt><strong>Description:</strong></dt>
900
901        <dd>Do you know the great CPAN (Comprehensive Perl Archive
902        Network) under <a
903        href="http://www.perl.com/CPAN">http://www.perl.com/CPAN</a>?
904        This does a redirect to one of several FTP servers around
905        the world which carry a CPAN mirror and is approximately
906        near the location of the requesting client. Actually this
907        can be called an FTP access multiplexing service. While
908        CPAN runs via CGI scripts, how can a similar approach
909        implemented via mod_rewrite?</dd>
910
911        <dt><strong>Solution:</strong></dt>
912
913        <dd>
914          First we notice that from version 3.0.0 mod_rewrite can
915          also use the "ftp:" scheme on redirects. And second, the
916          location approximation can be done by a rewritemap over
917          the top-level domain of the client. With a tricky chained
918          ruleset we can use this top-level domain as a key to our
919          multiplexing map.
920
921          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
922          cellpadding="5">
923            <tr>
924              <td>
925<pre>
926RewriteEngine on
927RewriteMap    multiplex                txt:/path/to/map.cxan
928RewriteRule   ^/CxAN/(.*)              %{REMOTE_HOST}::$1                 [C]
929RewriteRule   ^.+\.<strong>([a-zA-Z]+)</strong>::(.*)$  ${multiplex:<strong>$1</strong>|ftp.default.dom}$2  [R,L]
930</pre>
931              </td>
932            </tr>
933          </table>
934
935          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
936          cellpadding="5">
937            <tr>
938              <td>
939<pre>
940##
941##  map.cxan -- Multiplexing Map for CxAN
942##
943
944de        ftp://ftp.cxan.de/CxAN/
945uk        ftp://ftp.cxan.uk/CxAN/
946com       ftp://ftp.cxan.com/CxAN/
947 :
948##EOF##
949</pre>
950              </td>
951            </tr>
952          </table>
953        </dd>
954      </dl>
955
956      <h2>Time-Dependend Rewriting</h2>
957
958      <dl>
959        <dt><strong>Description:</strong></dt>
960
961        <dd>When tricks like time-dependend content should happen a
962        lot of webmasters still use CGI scripts which do for
963        instance redirects to specialized pages. How can it be done
964        via mod_rewrite?</dd>
965
966        <dt><strong>Solution:</strong></dt>
967
968        <dd>
969          There are a lot of variables named <code>TIME_xxx</code>
970          for rewrite conditions. In conjunction with the special
971          lexicographic comparison patterns &lt;STRING, &gt;STRING
972          and =STRING we can do time-dependend redirects:
973
974          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
975          cellpadding="5">
976            <tr>
977              <td>
978<pre>
979RewriteEngine on
980RewriteCond   %{TIME_HOUR}%{TIME_MIN} &gt;0700
981RewriteCond   %{TIME_HOUR}%{TIME_MIN} &lt;1900
982RewriteRule   ^foo\.html$             foo.day.html
983RewriteRule   ^foo\.html$             foo.night.html
984</pre>
985              </td>
986            </tr>
987          </table>
988
989          <p>This provides the content of <code>foo.day.html</code>
990          under the URL <code>foo.html</code> from 07:00-19:00 and
991          at the remaining time the contents of
992          <code>foo.night.html</code>. Just a nice feature for a
993          homepage...</p>
994        </dd>
995      </dl>
996
997      <h2>Backward Compatibility for YYYY to XXXX migration</h2>
998
999      <dl>
1000        <dt><strong>Description:</strong></dt>
1001
1002        <dd>How can we make URLs backward compatible (still
1003        existing virtually) after migrating document.YYYY to
1004        document.XXXX, e.g. after translating a bunch of .html
1005        files to .phtml?</dd>
1006
1007        <dt><strong>Solution:</strong></dt>
1008
1009        <dd>
1010          We just rewrite the name to its basename and test for
1011          existence of the new extension. If it exists, we take
1012          that name, else we rewrite the URL to its original state.
1013
1014
1015          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1016          cellpadding="5">
1017            <tr>
1018              <td>
1019<pre>
1020#   backward compatibility ruleset for
1021#   rewriting document.html to document.phtml
1022#   when and only when document.phtml exists
1023#   but no longer document.html
1024RewriteEngine on
1025RewriteBase   /~quux/
1026#   parse out basename, but remember the fact
1027RewriteRule   ^(.*)\.html$              $1      [C,E=WasHTML:yes]
1028#   rewrite to document.phtml if exists
1029RewriteCond   %{REQUEST_FILENAME}.phtml -f
1030RewriteRule   ^(.*)$ $1.phtml                   [S=1]
1031#   else reverse the previous basename cutout
1032RewriteCond   %{ENV:WasHTML}            ^yes$
1033RewriteRule   ^(.*)$ $1.html
1034</pre>
1035              </td>
1036            </tr>
1037          </table>
1038        </dd>
1039      </dl>
1040
1041      <h1>Content Handling</h1>
1042
1043      <h2>From Old to New (intern)</h2>
1044
1045      <dl>
1046        <dt><strong>Description:</strong></dt>
1047
1048        <dd>Assume we have recently renamed the page
1049        <code>foo.html</code> to <code>bar.html</code> and now want
1050        to provide the old URL for backward compatibility. Actually
1051        we want that users of the old URL even not recognize that
1052        the pages was renamed.</dd>
1053
1054        <dt><strong>Solution:</strong></dt>
1055
1056        <dd>
1057          We rewrite the old URL to the new one internally via the
1058          following rule:
1059
1060          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1061          cellpadding="5">
1062            <tr>
1063              <td>
1064<pre>
1065RewriteEngine  on
1066RewriteBase    /~quux/
1067RewriteRule    ^<strong>foo</strong>\.html$  <strong>bar</strong>.html
1068</pre>
1069              </td>
1070            </tr>
1071          </table>
1072        </dd>
1073      </dl>
1074
1075      <h2>From Old to New (extern)</h2>
1076
1077      <dl>
1078        <dt><strong>Description:</strong></dt>
1079
1080        <dd>Assume again that we have recently renamed the page
1081        <code>foo.html</code> to <code>bar.html</code> and now want
1082        to provide the old URL for backward compatibility. But this
1083        time we want that the users of the old URL get hinted to
1084        the new one, i.e. their browsers Location field should
1085        change, too.</dd>
1086
1087        <dt><strong>Solution:</strong></dt>
1088
1089        <dd>
1090          We force a HTTP redirect to the new URL which leads to a
1091          change of the browsers and thus the users view:
1092
1093          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1094          cellpadding="5">
1095            <tr>
1096              <td>
1097<pre>
1098RewriteEngine  on
1099RewriteBase    /~quux/
1100RewriteRule    ^<strong>foo</strong>\.html$  <strong>bar</strong>.html  [<strong>R</strong>]
1101</pre>
1102              </td>
1103            </tr>
1104          </table>
1105        </dd>
1106      </dl>
1107
1108      <h2>Browser Dependend Content</h2>
1109
1110      <dl>
1111        <dt><strong>Description:</strong></dt>
1112
1113        <dd>At least for important top-level pages it is sometimes
1114        necesarry to provide the optimum of browser dependend
1115        content, i.e. one has to provide a maximum version for the
1116        latest Netscape variants, a minimum version for the Lynx
1117        browsers and a average feature version for all others.</dd>
1118
1119        <dt><strong>Solution:</strong></dt>
1120
1121        <dd>
1122          We cannot use content negotiation because the browsers do
1123          not provide their type in that form. Instead we have to
1124          act on the HTTP header "User-Agent". The following condig
1125          does the following: If the HTTP header "User-Agent"
1126          begins with "Mozilla/3", the page <code>foo.html</code>
1127          is rewritten to <code>foo.NS.html</code> and and the
1128          rewriting stops. If the browser is "Lynx" or "Mozilla" of
1129          version 1 or 2 the URL becomes <code>foo.20.html</code>.
1130          All other browsers receive page <code>foo.32.html</code>.
1131          This is done by the following ruleset:
1132
1133          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1134          cellpadding="5">
1135            <tr>
1136              <td>
1137<pre>
1138RewriteCond %{HTTP_USER_AGENT}  ^<strong>Mozilla/3</strong>.*
1139RewriteRule ^foo\.html$         foo.<strong>NS</strong>.html          [<strong>L</strong>]
1140
1141RewriteCond %{HTTP_USER_AGENT}  ^<strong>Lynx/</strong>.*         [OR]
1142RewriteCond %{HTTP_USER_AGENT}  ^<strong>Mozilla/[12]</strong>.*
1143RewriteRule ^foo\.html$         foo.<strong>20</strong>.html          [<strong>L</strong>]
1144
1145RewriteRule ^foo\.html$         foo.<strong>32</strong>.html          [<strong>L</strong>]
1146</pre>
1147              </td>
1148            </tr>
1149          </table>
1150        </dd>
1151      </dl>
1152
1153      <h2>Dynamic Mirror</h2>
1154
1155      <dl>
1156        <dt><strong>Description:</strong></dt>
1157
1158        <dd>Assume there are nice webpages on remote hosts we want
1159        to bring into our namespace. For FTP servers we would use
1160        the <code>mirror</code> program which actually maintains an
1161        explicit up-to-date copy of the remote data on the local
1162        machine. For a webserver we could use the program
1163        <code>webcopy</code> which acts similar via HTTP. But both
1164        techniques have one major drawback: The local copy is
1165        always just as up-to-date as often we run the program. It
1166        would be much better if the mirror is not a static one we
1167        have to establish explicitly. Instead we want a dynamic
1168        mirror with data which gets updated automatically when
1169        there is need (updated data on the remote host).</dd>
1170
1171        <dt><strong>Solution:</strong></dt>
1172
1173        <dd>
1174          To provide this feature we map the remote webpage or even
1175          the complete remote webarea to our namespace by the use
1176          of the <i>Proxy Throughput</i> feature (flag [P]):
1177
1178          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1179          cellpadding="5">
1180            <tr>
1181              <td>
1182<pre>
1183RewriteEngine  on
1184RewriteBase    /~quux/
1185RewriteRule    ^<strong>hotsheet/</strong>(.*)$  <strong>http://www.tstimpreso.com/hotsheet/</strong>$1  [<strong>P</strong>]
1186</pre>
1187              </td>
1188            </tr>
1189          </table>
1190
1191          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1192          cellpadding="5">
1193            <tr>
1194              <td>
1195<pre>
1196RewriteEngine  on
1197RewriteBase    /~quux/
1198RewriteRule    ^<strong>usa-news\.html</strong>$   <strong>http://www.quux-corp.com/news/index.html</strong>  [<strong>P</strong>]
1199</pre>
1200              </td>
1201            </tr>
1202          </table>
1203        </dd>
1204      </dl>
1205
1206      <h2>Reverse Dynamic Mirror</h2>
1207
1208      <dl>
1209        <dt><strong>Description:</strong></dt>
1210
1211        <dd>...</dd>
1212
1213        <dt><strong>Solution:</strong></dt>
1214
1215        <dd>
1216          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1217          cellpadding="5">
1218            <tr>
1219              <td>
1220<pre>
1221RewriteEngine on
1222RewriteCond   /mirror/of/remotesite/$1           -U
1223RewriteRule   ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1
1224</pre>
1225              </td>
1226            </tr>
1227          </table>
1228        </dd>
1229      </dl>
1230
1231      <h2>Retrieve Missing Data from Intranet</h2>
1232
1233      <dl>
1234        <dt><strong>Description:</strong></dt>
1235
1236        <dd>This is a tricky way of virtually running a corporates
1237        (external) Internet webserver
1238        (<code>www.quux-corp.dom</code>), while actually keeping
1239        and maintaining its data on a (internal) Intranet webserver
1240        (<code>www2.quux-corp.dom</code>) which is protected by a
1241        firewall. The trick is that on the external webserver we
1242        retrieve the requested data on-the-fly from the internal
1243        one.</dd>
1244
1245        <dt><strong>Solution:</strong></dt>
1246
1247        <dd>
1248          First, we have to make sure that our firewall still
1249          protects the internal webserver and that only the
1250          external webserver is allowed to retrieve data from it.
1251          For a packet-filtering firewall we could for instance
1252          configure a firewall ruleset like the following:
1253
1254          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1255          cellpadding="5">
1256            <tr>
1257              <td>
1258<pre>
1259<strong>ALLOW</strong> Host www.quux-corp.dom Port &gt;1024 --&gt; Host www2.quux-corp.dom Port <strong>80</strong>
1260<strong>DENY</strong>  Host *                 Port *     --&gt; Host www2.quux-corp.dom Port <strong>80</strong>
1261</pre>
1262              </td>
1263            </tr>
1264          </table>
1265
1266          <p>Just adjust it to your actual configuration syntax.
1267          Now we can establish the mod_rewrite rules which request
1268          the missing data in the background through the proxy
1269          throughput feature:</p>
1270
1271          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1272          cellpadding="5">
1273            <tr>
1274              <td>
1275<pre>
1276RewriteRule ^/~([^/]+)/?(.*)          /home/$1/.www/$2
1277RewriteCond %{REQUEST_FILENAME}       <strong>!-f</strong>
1278RewriteCond %{REQUEST_FILENAME}       <strong>!-d</strong>
1279RewriteRule ^/home/([^/]+)/.www/?(.*) http://<strong>www2</strong>.quux-corp.dom/~$1/pub/$2 [<strong>P</strong>]
1280</pre>
1281              </td>
1282            </tr>
1283          </table>
1284        </dd>
1285      </dl>
1286
1287      <h2>Load Balancing</h2>
1288
1289      <dl>
1290        <dt><strong>Description:</strong></dt>
1291
1292        <dd>Suppose we want to load balance the traffic to
1293        <code>www.foo.com</code> over <code>www[0-5].foo.com</code>
1294        (a total of 6 servers). How can this be done?</dd>
1295
1296        <dt><strong>Solution:</strong></dt>
1297
1298        <dd>
1299          There are a lot of possible solutions for this problem.
1300          We will discuss first a commonly known DNS-based variant
1301          and then the special one with mod_rewrite:
1302
1303          <ol>
1304            <li>
1305              <strong>DNS Round-Robin</strong>
1306
1307              <p>The simplest method for load-balancing is to use
1308              the DNS round-robin feature of BIND. Here you just
1309              configure <code>www[0-9].foo.com</code> as usual in
1310              your DNS with A(address) records, e.g.</p>
1311
1312              <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1313              cellpadding="5">
1314                <tr>
1315                  <td>
1316<pre>
1317www0   IN  A       1.2.3.1
1318www1   IN  A       1.2.3.2
1319www2   IN  A       1.2.3.3
1320www3   IN  A       1.2.3.4
1321www4   IN  A       1.2.3.5
1322www5   IN  A       1.2.3.6
1323</pre>
1324                  </td>
1325                </tr>
1326              </table>
1327
1328              <p>Then you additionally add the following entry:</p>
1329
1330              <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1331              cellpadding="5">
1332                <tr>
1333                  <td>
1334<pre>
1335www    IN  CNAME   www0.foo.com.
1336       IN  CNAME   www1.foo.com.
1337       IN  CNAME   www2.foo.com.
1338       IN  CNAME   www3.foo.com.
1339       IN  CNAME   www4.foo.com.
1340       IN  CNAME   www5.foo.com.
1341       IN  CNAME   www6.foo.com.
1342</pre>
1343                  </td>
1344                </tr>
1345              </table>
1346
1347              <p>Notice that this seems wrong, but is actually an
1348              intended feature of BIND and can be used in this way.
1349              However, now when <code>www.foo.com</code> gets
1350              resolved, BIND gives out <code>www0-www6</code> - but
1351              in a slightly permutated/rotated order every time.
1352              This way the clients are spread over the various
1353              servers. But notice that this not a perfect load
1354              balancing scheme, because DNS resolve information
1355              gets cached by the other nameservers on the net, so
1356              once a client has resolved <code>www.foo.com</code>
1357              to a particular <code>wwwN.foo.com</code>, all
1358              subsequent requests also go to this particular name
1359              <code>wwwN.foo.com</code>. But the final result is
1360              ok, because the total sum of the requests are really
1361              spread over the various webservers.</p>
1362            </li>
1363
1364            <li>
1365              <strong>DNS Load-Balancing</strong>
1366
1367              <p>A sophisticated DNS-based method for
1368              load-balancing is to use the program
1369              <code>lbnamed</code> which can be found at <a
1370              href="http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html">
1371              http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html</a>.
1372              It is a Perl 5 program in conjunction with auxilliary
1373              tools which provides a real load-balancing for
1374              DNS.</p>
1375            </li>
1376
1377            <li>
1378              <strong>Proxy Throughput Round-Robin</strong>
1379
1380              <p>In this variant we use mod_rewrite and its proxy
1381              throughput feature. First we dedicate
1382              <code>www0.foo.com</code> to be actually
1383              <code>www.foo.com</code> by using a single</p>
1384
1385              <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1386              cellpadding="5">
1387                <tr>
1388                  <td>
1389<pre>
1390www    IN  CNAME   www0.foo.com.
1391</pre>
1392                  </td>
1393                </tr>
1394              </table>
1395
1396              <p>entry in the DNS. Then we convert
1397              <code>www0.foo.com</code> to a proxy-only server,
1398              i.e. we configure this machine so all arriving URLs
1399              are just pushed through the internal proxy to one of
1400              the 5 other servers (<code>www1-www5</code>). To
1401              accomplish this we first establish a ruleset which
1402              contacts a load balancing script <code>lb.pl</code>
1403              for all URLs.</p>
1404
1405              <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1406              cellpadding="5">
1407                <tr>
1408                  <td>
1409<pre>
1410RewriteEngine on
1411RewriteMap    lb      prg:/path/to/lb.pl
1412RewriteRule   ^/(.+)$ ${lb:$1}           [P,L]
1413</pre>
1414                  </td>
1415                </tr>
1416              </table>
1417
1418              <p>Then we write <code>lb.pl</code>:</p>
1419
1420              <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1421              cellpadding="5">
1422                <tr>
1423                  <td>
1424<pre>
1425#!/path/to/perl
1426##
1427##  lb.pl -- load balancing script
1428##
1429
1430$| = 1;
1431
1432$name   = "www";     # the hostname base
1433$first  = 1;         # the first server (not 0 here, because 0 is myself)
1434$last   = 5;         # the last server in the round-robin
1435$domain = "foo.dom"; # the domainname
1436
1437$cnt = 0;
1438while (&lt;STDIN&gt;) {
1439    $cnt = (($cnt+1) % ($last+1-$first));
1440    $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
1441    print "http://$server/$_";
1442}
1443
1444##EOF##
1445</pre>
1446                  </td>
1447                </tr>
1448              </table>
1449
1450              <p>A last notice: Why is this useful? Seems like
1451              <code>www0.foo.com</code> still is overloaded? The
1452              answer is yes, it is overloaded, but with plain proxy
1453              throughput requests, only! All SSI, CGI, ePerl, etc.
1454              processing is completely done on the other machines.
1455              This is the essential point.</p>
1456            </li>
1457
1458            <li>
1459              <strong>Hardware/TCP Round-Robin</strong>
1460
1461              <p>There is a hardware solution available, too. Cisco
1462              has a beast called LocalDirector which does a load
1463              balancing at the TCP/IP level. Actually this is some
1464              sort of a circuit level gateway in front of a
1465              webcluster. If you have enough money and really need
1466              a solution with high performance, use this one.</p>
1467            </li>
1468          </ol>
1469        </dd>
1470      </dl>
1471
1472      <h2>New MIME-type, New Service</h2>
1473
1474      <dl>
1475        <dt><strong>Description:</strong></dt>
1476
1477        <dd>
1478          On the net there are a lot of nifty CGI programs. But
1479          their usage is usually boring, so a lot of webmaster
1480          don't use them. Even Apache's Action handler feature for
1481          MIME-types is only appropriate when the CGI programs
1482          don't need special URLs (actually PATH_INFO and
1483          QUERY_STRINGS) as their input. First, let us configure a
1484          new file type with extension <code>.scgi</code> (for
1485          secure CGI) which will be processed by the popular
1486          <code>cgiwrap</code> program. The problem here is that
1487          for instance we use a Homogeneous URL Layout (see above)
1488          a file inside the user homedirs has the URL
1489          <code>/u/user/foo/bar.scgi</code>. But
1490          <code>cgiwrap</code> needs the URL in the form
1491          <code>/~user/foo/bar.scgi/</code>. The following rule
1492          solves the problem:
1493
1494          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1495          cellpadding="5">
1496            <tr>
1497              <td>
1498<pre>
1499RewriteRule ^/[uge]/<strong>([^/]+)</strong>/\.www/(.+)\.scgi(.*) ...
1500... /internal/cgi/user/cgiwrap/~<strong>$1</strong>/$2.scgi$3  [NS,<strong>T=application/x-http-cgi</strong>]
1501</pre>
1502              </td>
1503            </tr>
1504          </table>
1505
1506          <p>Or assume we have some more nifty programs:
1507          <code>wwwlog</code> (which displays the
1508          <code>access.log</code> for a URL subtree and
1509          <code>wwwidx</code> (which runs Glimpse on a URL
1510          subtree). We have to provide the URL area to these
1511          programs so they know on which area they have to act on.
1512          But usually this ugly, because they are all the times
1513          still requested from that areas, i.e. typically we would
1514          run the <code>swwidx</code> program from within
1515          <code>/u/user/foo/</code> via hyperlink to</p>
1516<pre>
1517/internal/cgi/user/swwidx?i=/u/user/foo/
1518</pre>
1519
1520          <p>which is ugly. Because we have to hard-code
1521          <strong>both</strong> the location of the area
1522          <strong>and</strong> the location of the CGI inside the
1523          hyperlink. When we have to reorganise or area, we spend a
1524          lot of time changing the various hyperlinks.</p>
1525        </dd>
1526
1527        <dt><strong>Solution:</strong></dt>
1528
1529        <dd>
1530          The solution here is to provide a special new URL format
1531          which automatically leads to the proper CGI invocation.
1532          We configure the following:
1533
1534          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1535          cellpadding="5">
1536            <tr>
1537              <td>
1538<pre>
1539RewriteRule   ^/([uge])/([^/]+)(/?.*)/\*  /internal/cgi/user/wwwidx?i=/$1/$2$3/
1540RewriteRule   ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
1541</pre>
1542              </td>
1543            </tr>
1544          </table>
1545
1546          <p>Now the hyperlink to search at
1547          <code>/u/user/foo/</code> reads only</p>
1548<pre>
1549HREF="*"
1550</pre>
1551
1552          <p>which internally gets automatically transformed to</p>
1553<pre>
1554/internal/cgi/user/wwwidx?i=/u/user/foo/
1555</pre>
1556
1557          <p>The same approach leads to an invocation for the
1558          access log CGI program when the hyperlink
1559          <code>:log</code> gets used.</p>
1560        </dd>
1561      </dl>
1562
1563      <h2>From Static to Dynamic</h2>
1564
1565      <dl>
1566        <dt><strong>Description:</strong></dt>
1567
1568        <dd>How can we transform a static page
1569        <code>foo.html</code> into a dynamic variant
1570        <code>foo.cgi</code> in a seamless way, i.e. without notice
1571        by the browser/user.</dd>
1572
1573        <dt><strong>Solution:</strong></dt>
1574
1575        <dd>
1576          We just rewrite the URL to the CGI-script and force the
1577          correct MIME-type so it gets really run as a CGI-script.
1578          This way a request to <code>/~quux/foo.html</code>
1579          internally leads to the invokation of
1580          <code>/~quux/foo.cgi</code>.
1581
1582          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1583          cellpadding="5">
1584            <tr>
1585              <td>
1586<pre>
1587RewriteEngine  on
1588RewriteBase    /~quux/
1589RewriteRule    ^foo\.<strong>html</strong>$  foo.<strong>cgi</strong>  [T=<strong>application/x-httpd-cgi</strong>]
1590</pre>
1591              </td>
1592            </tr>
1593          </table>
1594        </dd>
1595      </dl>
1596
1597      <h2>On-the-fly Content-Regeneration</h2>
1598
1599      <dl>
1600        <dt><strong>Description:</strong></dt>
1601
1602        <dd>Here comes a really esoteric feature: Dynamically
1603        generated but statically served pages, i.e. pages should be
1604        delivered as pure static pages (read from the filesystem
1605        and just passed through), but they have to be generated
1606        dynamically by the webserver if missing. This way you can
1607        have CGI-generated pages which are statically served unless
1608        one (or a cronjob) removes the static contents. Then the
1609        contents gets refreshed.</dd>
1610
1611        <dt><strong>Solution:</strong></dt>
1612
1613        <dd>
1614          This is done via the following ruleset:
1615
1616          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1617          cellpadding="5">
1618            <tr>
1619              <td>
1620<pre>
1621RewriteCond %{REQUEST_FILENAME}   <strong>!-s</strong>
1622RewriteRule ^page\.<strong>html</strong>$          page.<strong>cgi</strong>   [T=application/x-httpd-cgi,L]
1623</pre>
1624              </td>
1625            </tr>
1626          </table>
1627
1628          <p>Here a request to <code>page.html</code> leads to a
1629          internal run of a corresponding <code>page.cgi</code> if
1630          <code>page.html</code> is still missing or has filesize
1631          null. The trick here is that <code>page.cgi</code> is a
1632          usual CGI script which (additionally to its STDOUT)
1633          writes its output to the file <code>page.html</code>.
1634          Once it was run, the server sends out the data of
1635          <code>page.html</code>. When the webmaster wants to force
1636          a refresh the contents, he just removes
1637          <code>page.html</code> (usually done by a cronjob).</p>
1638        </dd>
1639      </dl>
1640
1641      <h2>Document With Autorefresh</h2>
1642
1643      <dl>
1644        <dt><strong>Description:</strong></dt>
1645
1646        <dd>Wouldn't it be nice while creating a complex webpage if
1647        the webbrowser would automatically refresh the page every
1648        time we write a new version from within our editor?
1649        Impossible?</dd>
1650
1651        <dt><strong>Solution:</strong></dt>
1652
1653        <dd>
1654          No! We just combine the MIME multipart feature, the
1655          webserver NPH feature and the URL manipulation power of
1656          mod_rewrite. First, we establish a new URL feature:
1657          Adding just <code>:refresh</code> to any URL causes this
1658          to be refreshed every time it gets updated on the
1659          filesystem.
1660
1661          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1662          cellpadding="5">
1663            <tr>
1664              <td>
1665<pre>
1666RewriteRule   ^(/[uge]/[^/]+/?.*):refresh  /internal/cgi/apache/nph-refresh?f=$1
1667</pre>
1668              </td>
1669            </tr>
1670          </table>
1671
1672          <p>Now when we reference the URL</p>
1673<pre>
1674/u/foo/bar/page.html:refresh
1675</pre>
1676
1677          <p>this leads to the internal invocation of the URL</p>
1678<pre>
1679/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
1680</pre>
1681
1682          <p>The only missing part is the NPH-CGI script. Although
1683          one would usually say "left as an exercise to the reader"
1684          ;-) I will provide this, too.</p>
1685<pre>
1686#!/sw/bin/perl
1687##
1688##  nph-refresh -- NPH/CGI script for auto refreshing pages
1689##  Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
1690##
1691$| = 1;
1692
1693#   split the QUERY_STRING variable
1694@pairs = split(/&amp;/, $ENV{'QUERY_STRING'});
1695foreach $pair (@pairs) {
1696    ($name, $value) = split(/=/, $pair);
1697    $name =~ tr/A-Z/a-z/;
1698    $name = 'QS_' . $name;
1699    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
1700    eval "\$$name = \"$value\"";
1701}
1702$QS_s = 1 if ($QS_s eq '');
1703$QS_n = 3600 if ($QS_n eq '');
1704if ($QS_f eq '') {
1705    print "HTTP/1.0 200 OK\n";
1706    print "Content-type: text/html\n\n";
1707    print "&amp;lt;b&amp;gt;ERROR&amp;lt;/b&amp;gt;: No file given\n";
1708    exit(0);
1709}
1710if (! -f $QS_f) {
1711    print "HTTP/1.0 200 OK\n";
1712    print "Content-type: text/html\n\n";
1713    print "&amp;lt;b&amp;gt;ERROR&amp;lt;/b&amp;gt;: File $QS_f not found\n";
1714    exit(0);
1715}
1716
1717sub print_http_headers_multipart_begin {
1718    print "HTTP/1.0 200 OK\n";
1719    $bound = "ThisRandomString12345";
1720    print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
1721    &amp;print_http_headers_multipart_next;
1722}
1723
1724sub print_http_headers_multipart_next {
1725    print "\n--$bound\n";
1726}
1727
1728sub print_http_headers_multipart_end {
1729    print "\n--$bound--\n";
1730}
1731
1732sub displayhtml {
1733    local($buffer) = @_;
1734    $len = length($buffer);
1735    print "Content-type: text/html\n";
1736    print "Content-length: $len\n\n";
1737    print $buffer;
1738}
1739
1740sub readfile {
1741    local($file) = @_;
1742    local(*FP, $size, $buffer, $bytes);
1743    ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
1744    $size = sprintf("%d", $size);
1745    open(FP, "&amp;lt;$file");
1746    $bytes = sysread(FP, $buffer, $size);
1747    close(FP);
1748    return $buffer;
1749}
1750
1751$buffer = &amp;readfile($QS_f);
1752&amp;print_http_headers_multipart_begin;
1753&amp;displayhtml($buffer);
1754
1755sub mystat {
1756    local($file) = $_[0];
1757    local($time);
1758
1759    ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
1760    return $mtime;
1761}
1762
1763$mtimeL = &amp;mystat($QS_f);
1764$mtime = $mtime;
1765for ($n = 0; $n &amp;lt; $QS_n; $n++) {
1766    while (1) {
1767        $mtime = &amp;mystat($QS_f);
1768        if ($mtime ne $mtimeL) {
1769            $mtimeL = $mtime;
1770            sleep(2);
1771            $buffer = &amp;readfile($QS_f);
1772            &amp;print_http_headers_multipart_next;
1773            &amp;displayhtml($buffer);
1774            sleep(5);
1775            $mtimeL = &amp;mystat($QS_f);
1776            last;
1777        }
1778        sleep($QS_s);
1779    }
1780}
1781
1782&amp;print_http_headers_multipart_end;
1783
1784exit(0);
1785
1786##EOF##
1787</pre>
1788        </dd>
1789      </dl>
1790
1791      <h2>Mass Virtual Hosting</h2>
1792
1793      <dl>
1794        <dt><strong>Description:</strong></dt>
1795
1796        <dd>The <code>&lt;VirtualHost&gt;</code> feature of Apache
1797        is nice and works great when you just have a few dozens
1798        virtual hosts. But when you are an ISP and have hundreds of
1799        virtual hosts to provide this feature is not the best
1800        choice.</dd>
1801
1802        <dt><strong>Solution:</strong></dt>
1803
1804        <dd>
1805          To provide this feature we map the remote webpage or even
1806          the complete remote webarea to our namespace by the use
1807          of the <i>Proxy Throughput</i> feature (flag [P]):
1808
1809          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1810          cellpadding="5">
1811            <tr>
1812              <td>
1813<pre>
1814##
1815##  vhost.map
1816##
1817www.vhost1.dom:80  /path/to/docroot/vhost1
1818www.vhost2.dom:80  /path/to/docroot/vhost2
1819     :
1820www.vhostN.dom:80  /path/to/docroot/vhostN
1821</pre>
1822              </td>
1823            </tr>
1824          </table>
1825
1826          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1827          cellpadding="5">
1828            <tr>
1829              <td>
1830<pre>
1831##
1832##  httpd.conf
1833##
1834    :
1835#   use the canonical hostname on redirects, etc.
1836UseCanonicalName on
1837
1838    :
1839#   add the virtual host in front of the CLF-format
1840CustomLog  /path/to/access_log  "%{VHOST}e %h %l %u %t \"%r\" %&gt;s %b"
1841    :
1842
1843#   enable the rewriting engine in the main server
1844RewriteEngine on
1845
1846#   define two maps: one for fixing the URL and one which defines
1847#   the available virtual hosts with their corresponding
1848#   DocumentRoot.
1849RewriteMap    lowercase    int:tolower
1850RewriteMap    vhost        txt:/path/to/vhost.map
1851
1852#   Now do the actual virtual host mapping
1853#   via a huge and complicated single rule:
1854#
1855#   1. make sure we don't map for common locations
1856RewriteCond   %{REQUEST_URI}  !^/commonurl1/.*
1857RewriteCond   %{REQUEST_URI}  !^/commonurl2/.*
1858    :
1859RewriteCond   %{REQUEST_URI}  !^/commonurlN/.*
1860#
1861#   2. make sure we have a Host header, because
1862#      currently our approach only supports
1863#      virtual hosting through this header
1864RewriteCond   %{HTTP_HOST}  !^$
1865#
1866#   3. lowercase the hostname
1867RewriteCond   ${lowercase:%{HTTP_HOST}|NONE}  ^(.+)$
1868#
1869#   4. lookup this hostname in vhost.map and
1870#      remember it only when it is a path
1871#      (and not "NONE" from above)
1872RewriteCond   ${vhost:%1}  ^(/.*)$
1873#
1874#   5. finally we can map the URL to its docroot location
1875#      and remember the virtual host for logging puposes
1876RewriteRule   ^/(.*)$   %1/$1  [E=VHOST:${lowercase:%{HTTP_HOST}}]
1877    :
1878</pre>
1879              </td>
1880            </tr>
1881          </table>
1882        </dd>
1883      </dl>
1884
1885      <h1>Access Restriction</h1>
1886
1887      <h2>Blocking of Robots</h2>
1888
1889      <dl>
1890        <dt><strong>Description:</strong></dt>
1891
1892        <dd>How can we block a really annoying robot from
1893        retrieving pages of a specific webarea? A
1894        <code>/robots.txt</code> file containing entries of the
1895        "Robot Exclusion Protocol" is typically not enough to get
1896        rid of such a robot.</dd>
1897
1898        <dt><strong>Solution:</strong></dt>
1899
1900        <dd>
1901          We use a ruleset which forbids the URLs of the webarea
1902          <code>/~quux/foo/arc/</code> (perhaps a very deep
1903          directory indexed area where the robot traversal would
1904          create big server load). We have to make sure that we
1905          forbid access only to the particular robot, i.e. just
1906          forbidding the host where the robot runs is not enough.
1907          This would block users from this host, too. We accomplish
1908          this by also matching the User-Agent HTTP header
1909          information.
1910
1911          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1912          cellpadding="5">
1913            <tr>
1914              <td>
1915<pre>
1916RewriteCond %{HTTP_USER_AGENT}   ^<strong>NameOfBadRobot</strong>.*
1917RewriteCond %{REMOTE_ADDR}       ^<strong>123\.45\.67\.[8-9]</strong>$
1918RewriteRule ^<strong>/~quux/foo/arc/</strong>.+   -   [<strong>F</strong>]
1919</pre>
1920              </td>
1921            </tr>
1922          </table>
1923        </dd>
1924      </dl>
1925
1926      <h2>Blocked Inline-Images</h2>
1927
1928      <dl>
1929        <dt><strong>Description:</strong></dt>
1930
1931        <dd>Assume we have under http://www.quux-corp.de/~quux/
1932        some pages with inlined GIF graphics. These graphics are
1933        nice, so others directly incorporate them via hyperlinks to
1934        their pages. We don't like this practice because it adds
1935        useless traffic to our server.</dd>
1936
1937        <dt><strong>Solution:</strong></dt>
1938
1939        <dd>
1940          While we cannot 100% protect the images from inclusion,
1941          we can at least restrict the cases where the browser
1942          sends a HTTP Referer header.
1943
1944          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1945          cellpadding="5">
1946            <tr>
1947              <td>
1948<pre>
1949RewriteCond %{HTTP_REFERER} <strong>!^$</strong>
1950RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC]
1951RewriteRule <strong>.*\.gif$</strong>        -                                    [F]
1952</pre>
1953              </td>
1954            </tr>
1955          </table>
1956
1957          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1958          cellpadding="5">
1959            <tr>
1960              <td>
1961<pre>
1962RewriteCond %{HTTP_REFERER}         !^$
1963RewriteCond %{HTTP_REFERER}         !.*/foo-with-gif\.html$
1964RewriteRule <strong>^inlined-in-foo\.gif$</strong>   -                        [F]
1965</pre>
1966              </td>
1967            </tr>
1968          </table>
1969        </dd>
1970      </dl>
1971
1972      <h2>Host Deny</h2>
1973
1974      <dl>
1975        <dt><strong>Description:</strong></dt>
1976
1977        <dd>How can we forbid a list of externally configured hosts
1978        from using our server?</dd>
1979
1980        <dt><strong>Solution:</strong></dt>
1981
1982        <dd>
1983          For Apache &gt;= 1.3b6:
1984
1985          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
1986          cellpadding="5">
1987            <tr>
1988              <td>
1989<pre>
1990RewriteEngine on
1991RewriteMap    hosts-deny  txt:/path/to/hosts.deny
1992RewriteCond   ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR]
1993RewriteCond   ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND
1994RewriteRule   ^/.*  -  [F]
1995</pre>
1996              </td>
1997            </tr>
1998          </table>
1999
2000          <p>For Apache &lt;= 1.3b6:</p>
2001
2002          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2003          cellpadding="5">
2004            <tr>
2005              <td>
2006<pre>
2007RewriteEngine on
2008RewriteMap    hosts-deny  txt:/path/to/hosts.deny
2009RewriteRule   ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1
2010RewriteRule   !^NOT-FOUND/.* - [F]
2011RewriteRule   ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1
2012RewriteRule   !^NOT-FOUND/.* - [F]
2013RewriteRule   ^NOT-FOUND/(.*)$ /$1
2014</pre>
2015              </td>
2016            </tr>
2017          </table>
2018
2019          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2020          cellpadding="5">
2021            <tr>
2022              <td>
2023<pre>
2024##
2025##  hosts.deny
2026##
2027##  ATTENTION! This is a map, not a list, even when we treat it as such.
2028##             mod_rewrite parses it for key/value pairs, so at least a
2029##             dummy value "-" must be present for each entry.
2030##
2031
2032193.102.180.41 -
2033bsdti1.sdm.de  -
2034192.76.162.40  -
2035</pre>
2036              </td>
2037            </tr>
2038          </table>
2039        </dd>
2040      </dl>
2041
2042      <h2>URL-Restricted Proxy</h2>
2043
2044      <dl>
2045        <dt><strong>Description:</strong></dt>
2046
2047        <dd>How can we restrict the proxy to allow access to a
2048        configurable set of internet sites only? The site list is
2049        extracted from a prepared bookmarks file.</dd>
2050
2051        <dt><strong>Solution:</strong></dt>
2052
2053        <dd>
2054          We first have to make sure mod_rewrite is below(!)
2055          mod_proxy in the <code>Configuration</code> file when
2056          compiling the Apache webserver (or in the
2057          <code>AddModule</code> list of <code>httpd.conf</code> in
2058          the case of dynamically loaded modules), as it must get
2059          called <em>_before_</em> mod_proxy.
2060
2061          <p>For simplicity, we generate the site list as a
2062          textfile map (but see the <a
2063          href="../mod/mod_rewrite.html#RewriteMap">mod_rewrite
2064          documentation</a> for a conversion script to DBM format).
2065          A typical Netscape bookmarks file can be converted to a
2066          list of sites with a shell script like this:</p>
2067
2068          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2069          cellpadding="5">
2070            <tr>
2071              <td>
2072<pre>
2073#!/bin/sh
2074cat ${1:-~/.netscape/bookmarks.html} |
2075tr -d '\015' | tr '[A-Z]' '[a-z]' | grep href=\" |
2076sed -e '/href="file:/d;' -e '/href="news:/d;' \
2077    -e 's|^.*href="[^:]*://\([^:/"]*\).*$|\1 OK|;' \
2078    -e '/href="/s|^.*href="\([^:/"]*\).*$|\1 OK|;' |
2079sort -u
2080</pre>
2081              </td>
2082            </tr>
2083          </table>
2084
2085          <p>We redirect the resulting output into a text file
2086          called <code>goodsites.txt</code>. It now looks similar
2087          to this:</p>
2088
2089          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2090          cellpadding="5">
2091            <tr>
2092              <td>
2093<pre>
2094www.apache.org OK
2095xml.apache.org OK
2096jakarta.apache.org OK
2097perl.apache.org OK
2098...
2099</pre>
2100              </td>
2101            </tr>
2102          </table>
2103
2104          <p>We reference this site file within the configuration
2105          for the <code>VirtualHost</code> which is responsible for
2106          serving as a proxy (often not port 80, but 81, 8080 or
2107          8008).</p>
2108
2109          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2110          cellpadding="5">
2111            <tr>
2112              <td>
2113<pre>
2114&lt;VirtualHost *:8008&gt;
2115  ...
2116  RewriteEngine   On
2117  # Either use the (plaintext) allow list from goodsites.txt
2118  RewriteMap      ProxyAllow   txt:/usr/local/apache/conf/goodsites.txt
2119  # Or, for faster access, convert it to a DBM database:
2120  #RewriteMap     ProxyAllow   dbm:/usr/local/apache/conf/goodsites
2121  # Match lowercased hostnames
2122  RewriteMap      lowercase    int:tolower
2123  # Here we go:
2124  # 1) first lowercase the site name and strip off a :port suffix
2125  RewriteCond  ${lowercase:%{HTTP_HOST}}    ^([^:]*).*$
2126  # 2) next look it up in the map file.
2127  #    "%1" refers to the previous regex.
2128  #    If the result is "OK", proxy access is granted.
2129  RewriteCond  ${ProxyAllow:%1|DENY}        !^OK$          [NC]
2130  # 3) Disallow proxy requests if the site was _not_ tagged "OK":
2131  RewriteRule  ^proxy:                      -              [F]
2132  ...
2133&lt;/VirtualHost&gt;
2134</pre>
2135              </td>
2136            </tr>
2137          </table>
2138        </dd>
2139      </dl>
2140
2141      <h2>Proxy Deny</h2>
2142
2143      <dl>
2144        <dt><strong>Description:</strong></dt>
2145
2146        <dd>How can we forbid a certain host or even a user of a
2147        special host from using the Apache proxy?</dd>
2148
2149        <dt><strong>Solution:</strong></dt>
2150
2151        <dd>
2152          We first have to make sure mod_rewrite is below(!)
2153          mod_proxy in the <code>Configuration</code> file when
2154          compiling the Apache webserver. This way it gets called
2155          <em>_before_</em> mod_proxy. Then we configure the
2156          following for a host-dependend deny...
2157
2158          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2159          cellpadding="5">
2160            <tr>
2161              <td>
2162<pre>
2163RewriteCond %{REMOTE_HOST} <strong>^badhost\.mydomain\.com$</strong>
2164RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
2165</pre>
2166              </td>
2167            </tr>
2168          </table>
2169
2170          <p>...and this one for a user@host-dependend deny:</p>
2171
2172          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2173          cellpadding="5">
2174            <tr>
2175              <td>
2176<pre>
2177RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST}  <strong>^badguy@badhost\.mydomain\.com$</strong>
2178RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
2179</pre>
2180              </td>
2181            </tr>
2182          </table>
2183        </dd>
2184      </dl>
2185
2186      <h2>Special Authentication Variant</h2>
2187
2188      <dl>
2189        <dt><strong>Description:</strong></dt>
2190
2191        <dd>Sometimes a very special authentication is needed, for
2192        instance a authentication which checks for a set of
2193        explicitly configured users. Only these should receive
2194        access and without explicit prompting (which would occur
2195        when using the Basic Auth via mod_access).</dd>
2196
2197        <dt><strong>Solution:</strong></dt>
2198
2199        <dd>
2200          We use a list of rewrite conditions to exclude all except
2201          our friends:
2202
2203          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2204          cellpadding="5">
2205            <tr>
2206              <td>
2207<pre>
2208RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend1@client1.quux-corp\.com$</strong>
2209RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend2</strong>@client2.quux-corp\.com$
2210RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend3</strong>@client3.quux-corp\.com$
2211RewriteRule ^/~quux/only-for-friends/      -                                 [F]
2212</pre>
2213              </td>
2214            </tr>
2215          </table>
2216        </dd>
2217      </dl>
2218
2219      <h2>Referer-based Deflector</h2>
2220
2221      <dl>
2222        <dt><strong>Description:</strong></dt>
2223
2224        <dd>How can we program a flexible URL Deflector which acts
2225        on the "Referer" HTTP header and can be configured with as
2226        many referring pages as we like?</dd>
2227
2228        <dt><strong>Solution:</strong></dt>
2229
2230        <dd>
2231          Use the following really tricky ruleset...
2232
2233          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2234          cellpadding="5">
2235            <tr>
2236              <td>
2237<pre>
2238RewriteMap  deflector txt:/path/to/deflector.map
2239
2240RewriteCond %{HTTP_REFERER} !=""
2241RewriteCond ${deflector:%{HTTP_REFERER}} ^-$
2242RewriteRule ^.* %{HTTP_REFERER} [R,L]
2243
2244RewriteCond %{HTTP_REFERER} !=""
2245RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND
2246RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L]
2247</pre>
2248              </td>
2249            </tr>
2250          </table>
2251
2252          <p>... in conjunction with a corresponding rewrite
2253          map:</p>
2254
2255          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2256          cellpadding="5">
2257            <tr>
2258              <td>
2259<pre>
2260##
2261##  deflector.map
2262##
2263
2264http://www.badguys.com/bad/index.html    -
2265http://www.badguys.com/bad/index2.html   -
2266http://www.badguys.com/bad/index3.html   http://somewhere.com/
2267</pre>
2268              </td>
2269            </tr>
2270          </table>
2271
2272          <p>This automatically redirects the request back to the
2273          referring page (when "-" is used as the value in the map)
2274          or to a specific URL (when an URL is specified in the map
2275          as the second argument).</p>
2276        </dd>
2277      </dl>
2278
2279      <h1>Other</h1>
2280
2281      <h2>External Rewriting Engine</h2>
2282
2283      <dl>
2284        <dt><strong>Description:</strong></dt>
2285
2286        <dd>A FAQ: How can we solve the FOO/BAR/QUUX/etc. problem?
2287        There seems no solution by the use of mod_rewrite...</dd>
2288
2289        <dt><strong>Solution:</strong></dt>
2290
2291        <dd>
2292          Use an external rewrite map, i.e. a program which acts
2293          like a rewrite map. It is run once on startup of Apache
2294          receives the requested URLs on STDIN and has to put the
2295          resulting (usually rewritten) URL on STDOUT (same
2296          order!).
2297
2298          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2299          cellpadding="5">
2300            <tr>
2301              <td>
2302<pre>
2303RewriteEngine on
2304RewriteMap    quux-map       <strong>prg:</strong>/path/to/map.quux.pl
2305RewriteRule   ^/~quux/(.*)$  /~quux/<strong>${quux-map:$1}</strong>
2306</pre>
2307              </td>
2308            </tr>
2309          </table>
2310
2311          <table bgcolor="#E0E5F5" border="0" cellspacing="0"
2312          cellpadding="5">
2313            <tr>
2314              <td>
2315<pre>
2316#!/path/to/perl
2317
2318#   disable buffered I/O which would lead
2319#   to deadloops for the Apache server
2320$| = 1;
2321
2322#   read URLs one per line from stdin and
2323#   generate substitution URL on stdout
2324while (&lt;&gt;) {
2325    s|^foo/|bar/|;
2326    print $_;
2327}
2328</pre>
2329              </td>
2330            </tr>
2331          </table>
2332
2333          <p>This is a demonstration-only example and just rewrites
2334          all URLs <code>/~quux/foo/...</code> to
2335          <code>/~quux/bar/...</code>. Actually you can program
2336          whatever you like. But notice that while such maps can be
2337          <strong>used</strong> also by an average user, only the
2338          system administrator can <strong>define</strong> it.</p>
2339        </dd>
2340      </dl>
2341          <hr />
2342
2343    <h3 align="CENTER">Apache HTTP Server Version 1.3</h3>
2344    <a href="./"><img src="../images/index.gif" alt="Index" /></a>
2345    <a href="../"><img src="../images/home.gif" alt="Home" /></a>
2346
2347    </blockquote>
2348  </body>
2349</html>
2350