1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3 4<html xmlns="http://www.w3.org/1999/xhtml"> 5 <head> 6 <meta name="generator" content="HTML Tidy, see www.w3.org" /> 7 8 <title>Apache 1.3 URL Rewriting Guide</title> 9 </head> 10 <!-- Background white, links blue (unvisited), navy (visited), red (active) --> 11 12 <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" 13 vlink="#000080" alink="#FF0000"> 14 <blockquote> 15 <div align="CENTER"> 16 <img src="../images/sub.gif" alt="[APACHE DOCUMENTATION]" /> 17 18 <h3>Apache HTTP Server Version 1.3</h3> 19 </div> 20 21 22 <div align="CENTER"> 23 <h1>Apache 1.3<br /> 24 URL Rewriting Guide<br /> 25 </h1> 26 27 <address> 28 Originally written by<br /> 29 Ralf S. Engelschall <rse@apache.org><br /> 30 December 1997 31 </address> 32 </div> 33 34 <p>This document supplements the mod_rewrite <a 35 href="../mod/mod_rewrite.html">reference documentation</a>. 36 It describes how one can use Apache's mod_rewrite to solve 37 typical URL-based problems with which webmasters are often 38 confronted. We give detailed descriptions on how to 39 solve each problem by configuring URL rewriting rulesets.</p> 40 41 <h2><a id="ToC1" name="ToC1">Introduction to 42 mod_rewrite</a></h2> 43 The Apache module mod_rewrite is a killer one, i.e. it is a 44 really sophisticated module which provides a powerful way to 45 do URL manipulations. With it you can do nearly all types of 46 URL manipulations you ever dreamed about. The price you have 47 to pay is to accept complexity, because mod_rewrite's major 48 drawback is that it is not easy to understand and use for the 49 beginner. And even Apache experts sometimes discover new 50 aspects where mod_rewrite can help. 51 52 <p>In other words: With mod_rewrite you either shoot yourself 53 in the foot the first time and never use it again or love it 54 for the rest of your life because of its power. This paper 55 tries to give you a few initial success events to avoid the 56 first case by presenting already invented solutions to 57 you.</p> 58 59 <h2><a id="ToC2" name="ToC2">Practical Solutions</a></h2> 60 Here come a lot of practical solutions I've either invented 61 myself or collected from other peoples solutions in the past. 62 Feel free to learn the black magic of URL rewriting from 63 these examples. 64 65 <table bgcolor="#FFE0E0" border="0" cellspacing="0" 66 cellpadding="5"> 67 <tr> 68 <td>ATTENTION: Depending on your server-configuration it 69 can be necessary to slightly change the examples for your 70 situation, e.g. adding the [PT] flag when additionally 71 using mod_alias and mod_userdir, etc. Or rewriting a 72 ruleset to fit in <code>.htaccess</code> context instead 73 of per-server context. Always try to understand what a 74 particular ruleset really does before you use it in order to 75 avoid problems.</td> 76 </tr> 77 </table> 78 79 <h1>URL Layout</h1> 80 81 <h2>Canonical URLs</h2> 82 83 <dl> 84 <dt><strong>Description:</strong></dt> 85 86 <dd>On some webservers there are more than one URL for a 87 resource. Usually there are canonical URLs (which should be 88 actually used and distributed) and those which are just 89 shortcuts, internal ones, etc. Independent which URL the 90 user supplied with the request he should finally see the 91 canonical one only.</dd> 92 93 <dt><strong>Solution:</strong></dt> 94 95 <dd> 96 We do an external HTTP redirect for all non-canonical 97 URLs to fix them in the location view of the Browser and 98 for all subsequent requests. In the example ruleset below 99 we replace <code>/~user</code> by the canonical 100 <code>/u/user</code> and fix a missing trailing slash for 101 <code>/u/user</code>. 102 103 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 104 cellpadding="5"> 105 <tr> 106 <td> 107<pre> 108RewriteRule ^/<strong>~</strong>([^/]+)/?(.*) /<strong>u</strong>/$1/$2 [<strong>R</strong>] 109RewriteRule ^/([uge])/(<strong>[^/]+</strong>)$ /$1/$2<strong>/</strong> [<strong>R</strong>] 110</pre> 111 </td> 112 </tr> 113 </table> 114 </dd> 115 </dl> 116 117 <h2>Canonical Hostnames</h2> 118 119 <dl> 120 <dt><strong>Description:</strong></dt> 121 122 <dd>The goal of this rule is to force the use of a particular 123 hostname, in preference to other hostnames which may be used to 124 reach the same site. For example, if you wish to force the use 125 of <strong>www.example.com</strong> instead of 126 <strong>example.com</strong>, you might use a variant of the 127 following recipe.</dd> 128 129 <dt><strong>Solution:</strong></dt> 130 131 <dd> 132 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 133 cellpadding="5"> 134 <tr> 135 <td> 136<pre> 137# For sites running on a port other than 80 138RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC] 139RewriteCond %{HTTP_HOST} !^$ 140RewriteCond %{SERVER_PORT} !^80$ 141RewriteRule ^/(.*) http://fully.qualified.domain.name:%{SERVER_PORT}/$1 [L,R] 142 143# And for a site running on port 80 144RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC] 145RewriteCond %{HTTP_HOST} !^$ 146RewriteRule ^/(.*) http://fully.qualified.domain.name/$1 [L,R] 147</pre> 148 </td> 149 </tr> 150 </table> 151 </dd> 152 </dl> 153 154 <h2>Moved DocumentRoot</h2> 155 156 <dl> 157 <dt><strong>Description:</strong></dt> 158 159 <dd>Usually the DocumentRoot of the webserver directly 160 relates to the URL ``<code>/</code>''. But often this data 161 is not really of top-level priority, it is perhaps just one 162 entity of a lot of data pools. For instance at our Intranet 163 sites there are <code>/e/www/</code> (the homepage for 164 WWW), <code>/e/sww/</code> (the homepage for the Intranet) 165 etc. Now because the data of the DocumentRoot stays at 166 <code>/e/www/</code> we had to make sure that all inlined 167 images and other stuff inside this data pool work for 168 subsequent requests.</dd> 169 170 <dt><strong>Solution:</strong></dt> 171 172 <dd> 173 We just redirect the URL <code>/</code> to 174 <code>/e/www/</code>. While is seems trivial it is 175 actually trivial with mod_rewrite, only. Because the 176 typical old mechanisms of URL <em>Aliases</em> (as 177 provides by mod_alias and friends) only used 178 <em>prefix</em> matching. With this you cannot do such a 179 redirection because the DocumentRoot is a prefix of all 180 URLs. With mod_rewrite it is really trivial: 181 182 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 183 cellpadding="5"> 184 <tr> 185 <td> 186<pre> 187RewriteEngine on 188RewriteRule <strong>^/$</strong> /e/www/ [<strong>R</strong>] 189</pre> 190 </td> 191 </tr> 192 </table> 193 </dd> 194 </dl> 195 196 <h2>Trailing Slash Problem</h2> 197 198 <dl> 199 <dt><strong>Description:</strong></dt> 200 201 <dd>Every webmaster can sing a song about the problem of 202 the trailing slash on URLs referencing directories. If they 203 are missing, the server dumps an error, because if you say 204 <code>/~quux/foo</code> instead of <code>/~quux/foo/</code> 205 then the server searches for a <em>file</em> named 206 <code>foo</code>. And because this file is a directory it 207 complains. Actually is tries to fix it themself in most of 208 the cases, but sometimes this mechanism need to be emulated 209 by you. For instance after you have done a lot of 210 complicated URL rewritings to CGI scripts etc.</dd> 211 212 <dt><strong>Solution:</strong></dt> 213 214 <dd> 215 The solution to this subtle problem is to let the server 216 add the trailing slash automatically. To do this 217 correctly we have to use an external redirect, so the 218 browser correctly requests subsequent images etc. If we 219 only did a internal rewrite, this would only work for the 220 directory page, but would go wrong when any images are 221 included into this page with relative URLs, because the 222 browser would request an in-lined object. For instance, a 223 request for <code>image.gif</code> in 224 <code>/~quux/foo/index.html</code> would become 225 <code>/~quux/image.gif</code> without the external 226 redirect! 227 228 <p>So, to do this trick we write:</p> 229 230 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 231 cellpadding="5"> 232 <tr> 233 <td> 234<pre> 235RewriteEngine on 236RewriteBase /~quux/ 237RewriteRule ^foo<strong>$</strong> foo<strong>/</strong> [<strong>R</strong>] 238</pre> 239 </td> 240 </tr> 241 </table> 242 243 <p>The crazy and lazy can even do the following in the 244 top-level <code>.htaccess</code> file of their homedir. 245 But notice that this creates some processing 246 overhead.</p> 247 248 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 249 cellpadding="5"> 250 <tr> 251 <td> 252<pre> 253RewriteEngine on 254RewriteBase /~quux/ 255RewriteCond %{REQUEST_FILENAME} <strong>-d</strong> 256RewriteRule ^(.+<strong>[^/]</strong>)$ $1<strong>/</strong> [R] 257</pre> 258 </td> 259 </tr> 260 </table> 261 </dd> 262 </dl> 263 264 <h2>Webcluster through Homogeneous URL Layout</h2> 265 266 <dl> 267 <dt><strong>Description:</strong></dt> 268 269 <dd>We want to create a homogenous and consistent URL 270 layout over all WWW servers on a Intranet webcluster, i.e. 271 all URLs (per definition server local and thus server 272 dependent!) become actually server <em>independed</em>! 273 What we want is to give the WWW namespace a consistent 274 server-independend layout: no URL should have to include 275 any physically correct target server. The cluster itself 276 should drive us automatically to the physical target 277 host.</dd> 278 279 <dt><strong>Solution:</strong></dt> 280 281 <dd> 282 First, the knowledge of the target servers come from 283 (distributed) external maps which contain information 284 where our users, groups and entities stay. The have the 285 form 286<pre> 287user1 server_of_user1 288user2 server_of_user2 289: : 290</pre> 291 292 <p>We put them into files <code>map.xxx-to-host</code>. 293 Second we need to instruct all servers to redirect URLs 294 of the forms</p> 295<pre> 296/u/user/anypath 297/g/group/anypath 298/e/entity/anypath 299</pre> 300 301 <p>to</p> 302<pre> 303http://physical-host/u/user/anypath 304http://physical-host/g/group/anypath 305http://physical-host/e/entity/anypath 306</pre> 307 308 <p>when the URL is not locally valid to a server. The 309 following ruleset does this for us by the help of the map 310 files (assuming that server0 is a default server which 311 will be used if a user has no entry in the map):</p> 312 313 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 314 cellpadding="5"> 315 <tr> 316 <td> 317<pre> 318RewriteEngine on 319 320RewriteMap user-to-host txt:/path/to/map.user-to-host 321RewriteMap group-to-host txt:/path/to/map.group-to-host 322RewriteMap entity-to-host txt:/path/to/map.entity-to-host 323 324RewriteRule ^/u/<strong>([^/]+)</strong>/?(.*) http://<strong>${user-to-host:$1|server0}</strong>/u/$1/$2 325RewriteRule ^/g/<strong>([^/]+)</strong>/?(.*) http://<strong>${group-to-host:$1|server0}</strong>/g/$1/$2 326RewriteRule ^/e/<strong>([^/]+)</strong>/?(.*) http://<strong>${entity-to-host:$1|server0}</strong>/e/$1/$2 327 328RewriteRule ^/([uge])/([^/]+)/?$ /$1/$2/.www/ 329RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\ 330</pre> 331 </td> 332 </tr> 333 </table> 334 </dd> 335 </dl> 336 337 <h2>Move Homedirs to Different Webserver</h2> 338 339 <dl> 340 <dt><strong>Description:</strong></dt> 341 342 <dd>A lot of webmaster aksed for a solution to the 343 following situation: They wanted to redirect just all 344 homedirs on a webserver to another webserver. They usually 345 need such things when establishing a newer webserver which 346 will replace the old one over time.</dd> 347 348 <dt><strong>Solution:</strong></dt> 349 350 <dd> 351 The solution is trivial with mod_rewrite. On the old 352 webserver we just redirect all 353 <code>/~user/anypath</code> URLs to 354 <code>http://newserver/~user/anypath</code>. 355 356 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 357 cellpadding="5"> 358 <tr> 359 <td> 360<pre> 361RewriteEngine on 362RewriteRule ^/~(.+) http://<strong>newserver</strong>/~$1 [R,L] 363</pre> 364 </td> 365 </tr> 366 </table> 367 </dd> 368 </dl> 369 370 <h2>Structured Homedirs</h2> 371 372 <dl> 373 <dt><strong>Description:</strong></dt> 374 375 <dd>Some sites with thousend of users usually use a 376 structured homedir layout, i.e. each homedir is in a 377 subdirectory which begins for instance with the first 378 character of the username. So, <code>/~foo/anypath</code> 379 is <code>/home/<strong>f</strong>/foo/.www/anypath</code> 380 while <code>/~bar/anypath</code> is 381 <code>/home/<strong>b</strong>/bar/.www/anypath</code>.</dd> 382 383 <dt><strong>Solution:</strong></dt> 384 385 <dd> 386 We use the following ruleset to expand the tilde URLs 387 into exactly the above layout. 388 389 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 390 cellpadding="5"> 391 <tr> 392 <td> 393<pre> 394RewriteEngine on 395RewriteRule ^/~(<strong>([a-z])</strong>[a-z0-9]+)(.*) /home/<strong>$2</strong>/$1/.www$3 396</pre> 397 </td> 398 </tr> 399 </table> 400 </dd> 401 </dl> 402 403 <h2>Filesystem Reorganisation</h2> 404 405 <dl> 406 <dt><strong>Description:</strong></dt> 407 408 <dd> 409 This really is a hardcore example: a killer application 410 which heavily uses per-directory 411 <code>RewriteRules</code> to get a smooth look and feel 412 on the Web while its data structure is never touched or 413 adjusted. Background: <strong><em>net.sw</em></strong> is 414 my archive of freely available Unix software packages, 415 which I started to collect in 1992. It is both my hobby 416 and job to to this, because while I'm studying computer 417 science I have also worked for many years as a system and 418 network administrator in my spare time. Every week I need 419 some sort of software so I created a deep hierarchy of 420 directories where I stored the packages: 421<pre> 422drwxrwxr-x 2 netsw users 512 Aug 3 18:39 Audio/ 423drwxrwxr-x 2 netsw users 512 Jul 9 14:37 Benchmark/ 424drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/ 425drwxrwxr-x 5 netsw users 512 Jul 9 00:41 Database/ 426drwxrwxr-x 4 netsw users 512 Jul 30 19:25 Dicts/ 427drwxrwxr-x 10 netsw users 512 Jul 9 01:54 Graphic/ 428drwxrwxr-x 5 netsw users 512 Jul 9 01:58 Hackers/ 429drwxrwxr-x 8 netsw users 512 Jul 9 03:19 InfoSys/ 430drwxrwxr-x 3 netsw users 512 Jul 9 03:21 Math/ 431drwxrwxr-x 3 netsw users 512 Jul 9 03:24 Misc/ 432drwxrwxr-x 9 netsw users 512 Aug 1 16:33 Network/ 433drwxrwxr-x 2 netsw users 512 Jul 9 05:53 Office/ 434drwxrwxr-x 7 netsw users 512 Jul 9 09:24 SoftEng/ 435drwxrwxr-x 7 netsw users 512 Jul 9 12:17 System/ 436drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/ 437drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/ 438</pre> 439 440 <p>In July 1996 I decided to make this archive public to 441 the world via a nice Web interface. "Nice" means that I 442 wanted to offer an interface where you can browse 443 directly through the archive hierarchy. And "nice" means 444 that I didn't wanted to change anything inside this 445 hierarchy - not even by putting some CGI scripts at the 446 top of it. Why? Because the above structure should be 447 later accessible via FTP as well, and I didn't want any 448 Web or CGI stuff to be there.</p> 449 </dd> 450 451 <dt><strong>Solution:</strong></dt> 452 453 <dd> 454 The solution has two parts: The first is a set of CGI 455 scripts which create all the pages at all directory 456 levels on-the-fly. I put them under 457 <code>/e/netsw/.www/</code> as follows: 458<pre> 459-rw-r--r-- 1 netsw users 1318 Aug 1 18:10 .wwwacl 460drwxr-xr-x 18 netsw users 512 Aug 5 15:51 DATA/ 461-rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE 462-rw-r--r-- 1 netsw users 659 Aug 4 09:27 TODO 463-rw-r--r-- 1 netsw users 5697 Aug 1 18:01 netsw-about.html 464-rwxr-xr-x 1 netsw users 579 Aug 2 10:33 netsw-access.pl 465-rwxr-xr-x 1 netsw users 1532 Aug 1 17:35 netsw-changes.cgi 466-rwxr-xr-x 1 netsw users 2866 Aug 5 14:49 netsw-home.cgi 467drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/ 468-rwxr-xr-x 1 netsw users 24050 Aug 5 15:49 netsw-lsdir.cgi 469-rwxr-xr-x 1 netsw users 1589 Aug 3 18:43 netsw-search.cgi 470-rwxr-xr-x 1 netsw users 1885 Aug 1 17:41 netsw-tree.cgi 471-rw-r--r-- 1 netsw users 234 Jul 30 16:35 netsw-unlimit.lst 472</pre> 473 474 <p>The <code>DATA/</code> subdirectory holds the above 475 directory structure, i.e. the real 476 <strong><em>net.sw</em></strong> stuff and gets 477 automatically updated via <code>rdist</code> from time to 478 time. The second part of the problem remains: how to link 479 these two structures together into one smooth-looking URL 480 tree? We want to hide the <code>DATA/</code> directory 481 from the user while running the appropriate CGI scripts 482 for the various URLs. Here is the solution: first I put 483 the following into the per-directory configuration file 484 in the Document Root of the server to rewrite the 485 announced URL <code>/net.sw/</code> to the internal path 486 <code>/e/netsw</code>:</p> 487 488 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 489 cellpadding="5"> 490 <tr> 491 <td> 492<pre> 493RewriteRule ^net.sw$ net.sw/ [R] 494RewriteRule ^net.sw/(.*)$ e/netsw/$1 495</pre> 496 </td> 497 </tr> 498 </table> 499 500 <p>The first rule is for requests which miss the trailing 501 slash! The second rule does the real thing. And then 502 comes the killer configuration which stays in the 503 per-directory config file 504 <code>/e/netsw/.www/.wwwacl</code>:</p> 505 506 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 507 cellpadding="5"> 508 <tr> 509 <td> 510<pre> 511Options ExecCGI FollowSymLinks Includes MultiViews 512 513RewriteEngine on 514 515# we are reached via /net.sw/ prefix 516RewriteBase /net.sw/ 517 518# first we rewrite the root dir to 519# the handling cgi script 520RewriteRule ^$ netsw-home.cgi [L] 521RewriteRule ^index\.html$ netsw-home.cgi [L] 522 523# strip out the subdirs when 524# the browser requests us from perdir pages 525RewriteRule ^.+/(netsw-[^/]+/.+)$ $1 [L] 526 527# and now break the rewriting for local files 528RewriteRule ^netsw-home\.cgi.* - [L] 529RewriteRule ^netsw-changes\.cgi.* - [L] 530RewriteRule ^netsw-search\.cgi.* - [L] 531RewriteRule ^netsw-tree\.cgi$ - [L] 532RewriteRule ^netsw-about\.html$ - [L] 533RewriteRule ^netsw-img/.*$ - [L] 534 535# anything else is a subdir which gets handled 536# by another cgi script 537RewriteRule !^netsw-lsdir\.cgi.* - [C] 538RewriteRule (.*) netsw-lsdir.cgi/$1 539</pre> 540 </td> 541 </tr> 542 </table> 543 544 <p>Some hints for interpretation:</p> 545 546 <ol> 547 <li>Notice the L (last) flag and no substitution field 548 ('-') in the forth part</li> 549 550 <li>Notice the ! (not) character and the C (chain) flag 551 at the first rule in the last part</li> 552 553 <li>Notice the catch-all pattern in the last rule</li> 554 </ol> 555 </dd> 556 </dl> 557 558 <h2>NCSA imagemap to Apache mod_imap</h2> 559 560 <dl> 561 <dt><strong>Description:</strong></dt> 562 563 <dd>When switching from the NCSA webserver to the more 564 modern Apache webserver a lot of people want a smooth 565 transition. So they want pages which use their old NCSA 566 <code>imagemap</code> program to work under Apache with the 567 modern <code>mod_imap</code>. The problem is that there are 568 a lot of hyperlinks around which reference the 569 <code>imagemap</code> program via 570 <code>/cgi-bin/imagemap/path/to/page.map</code>. Under 571 Apache this has to read just 572 <code>/path/to/page.map</code>.</dd> 573 574 <dt><strong>Solution:</strong></dt> 575 576 <dd> 577 We use a global rule to remove the prefix on-the-fly for 578 all requests: 579 580 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 581 cellpadding="5"> 582 <tr> 583 <td> 584<pre> 585RewriteEngine on 586RewriteRule ^/cgi-bin/imagemap(.*) $1 [PT] 587</pre> 588 </td> 589 </tr> 590 </table> 591 </dd> 592 </dl> 593 594 <h2>Search pages in more than one directory</h2> 595 596 <dl> 597 <dt><strong>Description:</strong></dt> 598 599 <dd>Sometimes it is neccessary to let the webserver search 600 for pages in more than one directory. Here MultiViews or 601 other techniques cannot help.</dd> 602 603 <dt><strong>Solution:</strong></dt> 604 605 <dd> 606 We program a explicit ruleset which searches for the 607 files in the directories. 608 609 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 610 cellpadding="5"> 611 <tr> 612 <td> 613<pre> 614RewriteEngine on 615 616# first try to find it in custom/... 617# ...and if found stop and be happy: 618RewriteCond /your/docroot/<strong>dir1</strong>/%{REQUEST_FILENAME} -f 619RewriteRule ^(.+) /your/docroot/<strong>dir1</strong>/$1 [L] 620 621# second try to find it in pub/... 622# ...and if found stop and be happy: 623RewriteCond /your/docroot/<strong>dir2</strong>/%{REQUEST_FILENAME} -f 624RewriteRule ^(.+) /your/docroot/<strong>dir2</strong>/$1 [L] 625 626# else go on for other Alias or ScriptAlias directives, 627# etc. 628RewriteRule ^(.+) - [PT] 629</pre> 630 </td> 631 </tr> 632 </table> 633 </dd> 634 </dl> 635 636 <h2>Set Environment Variables According To URL Parts</h2> 637 638 <dl> 639 <dt><strong>Description:</strong></dt> 640 641 <dd>Perhaps you want to keep status information between 642 requests and use the URL to encode it. But you don't want 643 to use a CGI wrapper for all pages just to strip out this 644 information.</dd> 645 646 <dt><strong>Solution:</strong></dt> 647 648 <dd> 649 We use a rewrite rule to strip out the status information 650 and remember it via an environment variable which can be 651 later dereferenced from within XSSI or CGI. This way a 652 URL <code>/foo/S=java/bar/</code> gets translated to 653 <code>/foo/bar/</code> and the environment variable named 654 <code>STATUS</code> is set to the value "java". 655 656 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 657 cellpadding="5"> 658 <tr> 659 <td> 660<pre> 661RewriteEngine on 662RewriteRule ^(.*)/<strong>S=([^/]+)</strong>/(.*) $1/$3 [E=<strong>STATUS:$2</strong>] 663</pre> 664 </td> 665 </tr> 666 </table> 667 </dd> 668 </dl> 669 670 <h2>Virtual User Hosts</h2> 671 672 <dl> 673 <dt><strong>Description:</strong></dt> 674 675 <dd>Assume that you want to provide 676 <code>www.<strong>username</strong>.host.domain.com</code> 677 for the homepage of username via just DNS A records to the 678 same machine and without any virtualhosts on this 679 machine.</dd> 680 681 <dt><strong>Solution:</strong></dt> 682 683 <dd> 684 For HTTP/1.0 requests there is no solution, but for 685 HTTP/1.1 requests which contain a Host: HTTP header we 686 can use the following ruleset to rewrite 687 <code>http://www.username.host.com/anypath</code> 688 internally to <code>/home/username/anypath</code>: 689 690 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 691 cellpadding="5"> 692 <tr> 693 <td> 694<pre> 695RewriteEngine on 696RewriteCond %{<strong>HTTP_HOST</strong>} ^www\.<strong>[^.]+</strong>\.host\.com$ 697RewriteRule ^(.+) %{HTTP_HOST}$1 [C] 698RewriteRule ^www\.<strong>([^.]+)</strong>\.host\.com(.*) /home/<strong>$1</strong>$2 699</pre> 700 </td> 701 </tr> 702 </table> 703 </dd> 704 </dl> 705 706 <h2>Redirect Homedirs For Foreigners</h2> 707 708 <dl> 709 <dt><strong>Description:</strong></dt> 710 711 <dd>We want to redirect homedir URLs to another webserver 712 <code>www.somewhere.com</code> when the requesting user 713 does not stay in the local domain 714 <code>ourdomain.com</code>. This is sometimes used in 715 virtual host contexts.</dd> 716 717 <dt><strong>Solution:</strong></dt> 718 719 <dd> 720 Just a rewrite condition: 721 722 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 723 cellpadding="5"> 724 <tr> 725 <td> 726<pre> 727RewriteEngine on 728RewriteCond %{REMOTE_HOST} <strong>!^.+\.ourdomain\.com$</strong> 729RewriteRule ^(/~.+) http://www.somewhere.com/$1 [R,L] 730</pre> 731 </td> 732 </tr> 733 </table> 734 </dd> 735 </dl> 736 737 <h2>Redirect Failing URLs To Other Webserver</h2> 738 739 <dl> 740 <dt><strong>Description:</strong></dt> 741 742 <dd>A typical FAQ about URL rewriting is how to redirect 743 failing requests on webserver A to webserver B. Usually 744 this is done via ErrorDocument CGI-scripts in Perl, but 745 there is also a mod_rewrite solution. But notice that this 746 is less performant than using a ErrorDocument 747 CGI-script!</dd> 748 749 <dt><strong>Solution:</strong></dt> 750 751 <dd> 752 The first solution has the best performance but less 753 flexibility and is less error safe: 754 755 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 756 cellpadding="5"> 757 <tr> 758 <td> 759<pre> 760RewriteEngine on 761RewriteCond /your/docroot/%{REQUEST_FILENAME} <strong>!-f</strong> 762RewriteRule ^(.+) http://<strong>webserverB</strong>.dom/$1 763</pre> 764 </td> 765 </tr> 766 </table> 767 768 <p>The problem here is that this will only work for pages 769 inside the DocumentRoot. While you can add more 770 Conditions (for instance to also handle homedirs, etc.) 771 there is better variant:</p> 772 773 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 774 cellpadding="5"> 775 <tr> 776 <td> 777<pre> 778RewriteEngine on 779RewriteCond %{REQUEST_URI} <strong>!-U</strong> 780RewriteRule ^(.+) http://<strong>webserverB</strong>.dom/$1 781</pre> 782 </td> 783 </tr> 784 </table> 785 786 <p>This uses the URL look-ahead feature of mod_rewrite. 787 The result is that this will work for all types of URLs 788 and is a safe way. But it does a performance impact on 789 the webserver, because for every request there is one 790 more internal subrequest. So, if your webserver runs on a 791 powerful CPU, use this one. If it is a slow machine, use 792 the first approach or better a ErrorDocument 793 CGI-script.</p> 794 </dd> 795 </dl> 796 797 <h2>Extended Redirection</h2> 798 799 <dl> 800 <dt><strong>Description:</strong></dt> 801 802 <dd>Sometimes we need more control (concerning the 803 character escaping mechanism) of URLs on redirects. Usually 804 the Apache kernels URL escape function also escapes 805 anchors, i.e. URLs like "url#anchor". You cannot use this 806 directly on redirects with mod_rewrite because the 807 uri_escape() function of Apache would also escape the hash 808 character. How can we redirect to such a URL?</dd> 809 810 <dt><strong>Solution:</strong></dt> 811 812 <dd> 813 We have to use a kludge by the use of a NPH-CGI script 814 which does the redirect itself. Because here no escaping 815 is done (NPH=non-parseable headers). First we introduce a 816 new URL scheme <code>xredirect:</code> by the following 817 per-server config-line (should be one of the last rewrite 818 rules): 819 820 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 821 cellpadding="5"> 822 <tr> 823 <td> 824<pre> 825RewriteRule ^xredirect:(.+) /path/to/nph-xredirect.cgi/$1 \ 826 [T=application/x-httpd-cgi,L] 827</pre> 828 </td> 829 </tr> 830 </table> 831 832 <p>This forces all URLs prefixed with 833 <code>xredirect:</code> to be piped through the 834 <code>nph-xredirect.cgi</code> program. And this program 835 just looks like:</p> 836 837 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 838 cellpadding="5"> 839 <tr> 840 <td> 841<pre> 842#!/path/to/perl 843## 844## nph-xredirect.cgi -- NPH/CGI script for extended redirects 845## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved. 846## 847 848$| = 1; 849$url = $ENV{'PATH_INFO'}; 850 851print "HTTP/1.0 302 Moved Temporarily\n"; 852print "Server: $ENV{'SERVER_SOFTWARE'}\n"; 853print "Location: $url\n"; 854print "Content-type: text/html\n"; 855print "\n"; 856print "<html>\n"; 857print "<head>\n"; 858print "<title>302 Moved Temporarily (EXTENDED)</title>\n"; 859print "</head>\n"; 860print "<body>\n"; 861print "<h1>Moved Temporarily (EXTENDED)</h1>\n"; 862print "The document has moved <a HREF=\"$url\">here</a>.<p>\n"; 863print "</body>\n"; 864print "</html>\n"; 865 866##EOF## 867</pre> 868 </td> 869 </tr> 870 </table> 871 872 <p>This provides you with the functionality to do 873 redirects to all URL schemes, i.e. including the one 874 which are not directly accepted by mod_rewrite. For 875 instance you can now also redirect to 876 <code>news:newsgroup</code> via</p> 877 878 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 879 cellpadding="5"> 880 <tr> 881 <td> 882<pre> 883RewriteRule ^anyurl xredirect:news:newsgroup 884</pre> 885 </td> 886 </tr> 887 </table> 888 889 <p>Notice: You have not to put [R] or [R,L] to the above 890 rule because the <code>xredirect:</code> need to be 891 expanded later by our special "pipe through" rule 892 above.</p> 893 </dd> 894 </dl> 895 896 <h2>Archive Access Multiplexer</h2> 897 898 <dl> 899 <dt><strong>Description:</strong></dt> 900 901 <dd>Do you know the great CPAN (Comprehensive Perl Archive 902 Network) under <a 903 href="http://www.perl.com/CPAN">http://www.perl.com/CPAN</a>? 904 This does a redirect to one of several FTP servers around 905 the world which carry a CPAN mirror and is approximately 906 near the location of the requesting client. Actually this 907 can be called an FTP access multiplexing service. While 908 CPAN runs via CGI scripts, how can a similar approach 909 implemented via mod_rewrite?</dd> 910 911 <dt><strong>Solution:</strong></dt> 912 913 <dd> 914 First we notice that from version 3.0.0 mod_rewrite can 915 also use the "ftp:" scheme on redirects. And second, the 916 location approximation can be done by a rewritemap over 917 the top-level domain of the client. With a tricky chained 918 ruleset we can use this top-level domain as a key to our 919 multiplexing map. 920 921 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 922 cellpadding="5"> 923 <tr> 924 <td> 925<pre> 926RewriteEngine on 927RewriteMap multiplex txt:/path/to/map.cxan 928RewriteRule ^/CxAN/(.*) %{REMOTE_HOST}::$1 [C] 929RewriteRule ^.+\.<strong>([a-zA-Z]+)</strong>::(.*)$ ${multiplex:<strong>$1</strong>|ftp.default.dom}$2 [R,L] 930</pre> 931 </td> 932 </tr> 933 </table> 934 935 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 936 cellpadding="5"> 937 <tr> 938 <td> 939<pre> 940## 941## map.cxan -- Multiplexing Map for CxAN 942## 943 944de ftp://ftp.cxan.de/CxAN/ 945uk ftp://ftp.cxan.uk/CxAN/ 946com ftp://ftp.cxan.com/CxAN/ 947 : 948##EOF## 949</pre> 950 </td> 951 </tr> 952 </table> 953 </dd> 954 </dl> 955 956 <h2>Time-Dependend Rewriting</h2> 957 958 <dl> 959 <dt><strong>Description:</strong></dt> 960 961 <dd>When tricks like time-dependend content should happen a 962 lot of webmasters still use CGI scripts which do for 963 instance redirects to specialized pages. How can it be done 964 via mod_rewrite?</dd> 965 966 <dt><strong>Solution:</strong></dt> 967 968 <dd> 969 There are a lot of variables named <code>TIME_xxx</code> 970 for rewrite conditions. In conjunction with the special 971 lexicographic comparison patterns <STRING, >STRING 972 and =STRING we can do time-dependend redirects: 973 974 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 975 cellpadding="5"> 976 <tr> 977 <td> 978<pre> 979RewriteEngine on 980RewriteCond %{TIME_HOUR}%{TIME_MIN} >0700 981RewriteCond %{TIME_HOUR}%{TIME_MIN} <1900 982RewriteRule ^foo\.html$ foo.day.html 983RewriteRule ^foo\.html$ foo.night.html 984</pre> 985 </td> 986 </tr> 987 </table> 988 989 <p>This provides the content of <code>foo.day.html</code> 990 under the URL <code>foo.html</code> from 07:00-19:00 and 991 at the remaining time the contents of 992 <code>foo.night.html</code>. Just a nice feature for a 993 homepage...</p> 994 </dd> 995 </dl> 996 997 <h2>Backward Compatibility for YYYY to XXXX migration</h2> 998 999 <dl> 1000 <dt><strong>Description:</strong></dt> 1001 1002 <dd>How can we make URLs backward compatible (still 1003 existing virtually) after migrating document.YYYY to 1004 document.XXXX, e.g. after translating a bunch of .html 1005 files to .phtml?</dd> 1006 1007 <dt><strong>Solution:</strong></dt> 1008 1009 <dd> 1010 We just rewrite the name to its basename and test for 1011 existence of the new extension. If it exists, we take 1012 that name, else we rewrite the URL to its original state. 1013 1014 1015 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1016 cellpadding="5"> 1017 <tr> 1018 <td> 1019<pre> 1020# backward compatibility ruleset for 1021# rewriting document.html to document.phtml 1022# when and only when document.phtml exists 1023# but no longer document.html 1024RewriteEngine on 1025RewriteBase /~quux/ 1026# parse out basename, but remember the fact 1027RewriteRule ^(.*)\.html$ $1 [C,E=WasHTML:yes] 1028# rewrite to document.phtml if exists 1029RewriteCond %{REQUEST_FILENAME}.phtml -f 1030RewriteRule ^(.*)$ $1.phtml [S=1] 1031# else reverse the previous basename cutout 1032RewriteCond %{ENV:WasHTML} ^yes$ 1033RewriteRule ^(.*)$ $1.html 1034</pre> 1035 </td> 1036 </tr> 1037 </table> 1038 </dd> 1039 </dl> 1040 1041 <h1>Content Handling</h1> 1042 1043 <h2>From Old to New (intern)</h2> 1044 1045 <dl> 1046 <dt><strong>Description:</strong></dt> 1047 1048 <dd>Assume we have recently renamed the page 1049 <code>foo.html</code> to <code>bar.html</code> and now want 1050 to provide the old URL for backward compatibility. Actually 1051 we want that users of the old URL even not recognize that 1052 the pages was renamed.</dd> 1053 1054 <dt><strong>Solution:</strong></dt> 1055 1056 <dd> 1057 We rewrite the old URL to the new one internally via the 1058 following rule: 1059 1060 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1061 cellpadding="5"> 1062 <tr> 1063 <td> 1064<pre> 1065RewriteEngine on 1066RewriteBase /~quux/ 1067RewriteRule ^<strong>foo</strong>\.html$ <strong>bar</strong>.html 1068</pre> 1069 </td> 1070 </tr> 1071 </table> 1072 </dd> 1073 </dl> 1074 1075 <h2>From Old to New (extern)</h2> 1076 1077 <dl> 1078 <dt><strong>Description:</strong></dt> 1079 1080 <dd>Assume again that we have recently renamed the page 1081 <code>foo.html</code> to <code>bar.html</code> and now want 1082 to provide the old URL for backward compatibility. But this 1083 time we want that the users of the old URL get hinted to 1084 the new one, i.e. their browsers Location field should 1085 change, too.</dd> 1086 1087 <dt><strong>Solution:</strong></dt> 1088 1089 <dd> 1090 We force a HTTP redirect to the new URL which leads to a 1091 change of the browsers and thus the users view: 1092 1093 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1094 cellpadding="5"> 1095 <tr> 1096 <td> 1097<pre> 1098RewriteEngine on 1099RewriteBase /~quux/ 1100RewriteRule ^<strong>foo</strong>\.html$ <strong>bar</strong>.html [<strong>R</strong>] 1101</pre> 1102 </td> 1103 </tr> 1104 </table> 1105 </dd> 1106 </dl> 1107 1108 <h2>Browser Dependend Content</h2> 1109 1110 <dl> 1111 <dt><strong>Description:</strong></dt> 1112 1113 <dd>At least for important top-level pages it is sometimes 1114 necesarry to provide the optimum of browser dependend 1115 content, i.e. one has to provide a maximum version for the 1116 latest Netscape variants, a minimum version for the Lynx 1117 browsers and a average feature version for all others.</dd> 1118 1119 <dt><strong>Solution:</strong></dt> 1120 1121 <dd> 1122 We cannot use content negotiation because the browsers do 1123 not provide their type in that form. Instead we have to 1124 act on the HTTP header "User-Agent". The following condig 1125 does the following: If the HTTP header "User-Agent" 1126 begins with "Mozilla/3", the page <code>foo.html</code> 1127 is rewritten to <code>foo.NS.html</code> and and the 1128 rewriting stops. If the browser is "Lynx" or "Mozilla" of 1129 version 1 or 2 the URL becomes <code>foo.20.html</code>. 1130 All other browsers receive page <code>foo.32.html</code>. 1131 This is done by the following ruleset: 1132 1133 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1134 cellpadding="5"> 1135 <tr> 1136 <td> 1137<pre> 1138RewriteCond %{HTTP_USER_AGENT} ^<strong>Mozilla/3</strong>.* 1139RewriteRule ^foo\.html$ foo.<strong>NS</strong>.html [<strong>L</strong>] 1140 1141RewriteCond %{HTTP_USER_AGENT} ^<strong>Lynx/</strong>.* [OR] 1142RewriteCond %{HTTP_USER_AGENT} ^<strong>Mozilla/[12]</strong>.* 1143RewriteRule ^foo\.html$ foo.<strong>20</strong>.html [<strong>L</strong>] 1144 1145RewriteRule ^foo\.html$ foo.<strong>32</strong>.html [<strong>L</strong>] 1146</pre> 1147 </td> 1148 </tr> 1149 </table> 1150 </dd> 1151 </dl> 1152 1153 <h2>Dynamic Mirror</h2> 1154 1155 <dl> 1156 <dt><strong>Description:</strong></dt> 1157 1158 <dd>Assume there are nice webpages on remote hosts we want 1159 to bring into our namespace. For FTP servers we would use 1160 the <code>mirror</code> program which actually maintains an 1161 explicit up-to-date copy of the remote data on the local 1162 machine. For a webserver we could use the program 1163 <code>webcopy</code> which acts similar via HTTP. But both 1164 techniques have one major drawback: The local copy is 1165 always just as up-to-date as often we run the program. It 1166 would be much better if the mirror is not a static one we 1167 have to establish explicitly. Instead we want a dynamic 1168 mirror with data which gets updated automatically when 1169 there is need (updated data on the remote host).</dd> 1170 1171 <dt><strong>Solution:</strong></dt> 1172 1173 <dd> 1174 To provide this feature we map the remote webpage or even 1175 the complete remote webarea to our namespace by the use 1176 of the <i>Proxy Throughput</i> feature (flag [P]): 1177 1178 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1179 cellpadding="5"> 1180 <tr> 1181 <td> 1182<pre> 1183RewriteEngine on 1184RewriteBase /~quux/ 1185RewriteRule ^<strong>hotsheet/</strong>(.*)$ <strong>http://www.tstimpreso.com/hotsheet/</strong>$1 [<strong>P</strong>] 1186</pre> 1187 </td> 1188 </tr> 1189 </table> 1190 1191 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1192 cellpadding="5"> 1193 <tr> 1194 <td> 1195<pre> 1196RewriteEngine on 1197RewriteBase /~quux/ 1198RewriteRule ^<strong>usa-news\.html</strong>$ <strong>http://www.quux-corp.com/news/index.html</strong> [<strong>P</strong>] 1199</pre> 1200 </td> 1201 </tr> 1202 </table> 1203 </dd> 1204 </dl> 1205 1206 <h2>Reverse Dynamic Mirror</h2> 1207 1208 <dl> 1209 <dt><strong>Description:</strong></dt> 1210 1211 <dd>...</dd> 1212 1213 <dt><strong>Solution:</strong></dt> 1214 1215 <dd> 1216 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1217 cellpadding="5"> 1218 <tr> 1219 <td> 1220<pre> 1221RewriteEngine on 1222RewriteCond /mirror/of/remotesite/$1 -U 1223RewriteRule ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1 1224</pre> 1225 </td> 1226 </tr> 1227 </table> 1228 </dd> 1229 </dl> 1230 1231 <h2>Retrieve Missing Data from Intranet</h2> 1232 1233 <dl> 1234 <dt><strong>Description:</strong></dt> 1235 1236 <dd>This is a tricky way of virtually running a corporates 1237 (external) Internet webserver 1238 (<code>www.quux-corp.dom</code>), while actually keeping 1239 and maintaining its data on a (internal) Intranet webserver 1240 (<code>www2.quux-corp.dom</code>) which is protected by a 1241 firewall. The trick is that on the external webserver we 1242 retrieve the requested data on-the-fly from the internal 1243 one.</dd> 1244 1245 <dt><strong>Solution:</strong></dt> 1246 1247 <dd> 1248 First, we have to make sure that our firewall still 1249 protects the internal webserver and that only the 1250 external webserver is allowed to retrieve data from it. 1251 For a packet-filtering firewall we could for instance 1252 configure a firewall ruleset like the following: 1253 1254 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1255 cellpadding="5"> 1256 <tr> 1257 <td> 1258<pre> 1259<strong>ALLOW</strong> Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port <strong>80</strong> 1260<strong>DENY</strong> Host * Port * --> Host www2.quux-corp.dom Port <strong>80</strong> 1261</pre> 1262 </td> 1263 </tr> 1264 </table> 1265 1266 <p>Just adjust it to your actual configuration syntax. 1267 Now we can establish the mod_rewrite rules which request 1268 the missing data in the background through the proxy 1269 throughput feature:</p> 1270 1271 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1272 cellpadding="5"> 1273 <tr> 1274 <td> 1275<pre> 1276RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2 1277RewriteCond %{REQUEST_FILENAME} <strong>!-f</strong> 1278RewriteCond %{REQUEST_FILENAME} <strong>!-d</strong> 1279RewriteRule ^/home/([^/]+)/.www/?(.*) http://<strong>www2</strong>.quux-corp.dom/~$1/pub/$2 [<strong>P</strong>] 1280</pre> 1281 </td> 1282 </tr> 1283 </table> 1284 </dd> 1285 </dl> 1286 1287 <h2>Load Balancing</h2> 1288 1289 <dl> 1290 <dt><strong>Description:</strong></dt> 1291 1292 <dd>Suppose we want to load balance the traffic to 1293 <code>www.foo.com</code> over <code>www[0-5].foo.com</code> 1294 (a total of 6 servers). How can this be done?</dd> 1295 1296 <dt><strong>Solution:</strong></dt> 1297 1298 <dd> 1299 There are a lot of possible solutions for this problem. 1300 We will discuss first a commonly known DNS-based variant 1301 and then the special one with mod_rewrite: 1302 1303 <ol> 1304 <li> 1305 <strong>DNS Round-Robin</strong> 1306 1307 <p>The simplest method for load-balancing is to use 1308 the DNS round-robin feature of BIND. Here you just 1309 configure <code>www[0-9].foo.com</code> as usual in 1310 your DNS with A(address) records, e.g.</p> 1311 1312 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1313 cellpadding="5"> 1314 <tr> 1315 <td> 1316<pre> 1317www0 IN A 1.2.3.1 1318www1 IN A 1.2.3.2 1319www2 IN A 1.2.3.3 1320www3 IN A 1.2.3.4 1321www4 IN A 1.2.3.5 1322www5 IN A 1.2.3.6 1323</pre> 1324 </td> 1325 </tr> 1326 </table> 1327 1328 <p>Then you additionally add the following entry:</p> 1329 1330 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1331 cellpadding="5"> 1332 <tr> 1333 <td> 1334<pre> 1335www IN CNAME www0.foo.com. 1336 IN CNAME www1.foo.com. 1337 IN CNAME www2.foo.com. 1338 IN CNAME www3.foo.com. 1339 IN CNAME www4.foo.com. 1340 IN CNAME www5.foo.com. 1341 IN CNAME www6.foo.com. 1342</pre> 1343 </td> 1344 </tr> 1345 </table> 1346 1347 <p>Notice that this seems wrong, but is actually an 1348 intended feature of BIND and can be used in this way. 1349 However, now when <code>www.foo.com</code> gets 1350 resolved, BIND gives out <code>www0-www6</code> - but 1351 in a slightly permutated/rotated order every time. 1352 This way the clients are spread over the various 1353 servers. But notice that this not a perfect load 1354 balancing scheme, because DNS resolve information 1355 gets cached by the other nameservers on the net, so 1356 once a client has resolved <code>www.foo.com</code> 1357 to a particular <code>wwwN.foo.com</code>, all 1358 subsequent requests also go to this particular name 1359 <code>wwwN.foo.com</code>. But the final result is 1360 ok, because the total sum of the requests are really 1361 spread over the various webservers.</p> 1362 </li> 1363 1364 <li> 1365 <strong>DNS Load-Balancing</strong> 1366 1367 <p>A sophisticated DNS-based method for 1368 load-balancing is to use the program 1369 <code>lbnamed</code> which can be found at <a 1370 href="http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html"> 1371 http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html</a>. 1372 It is a Perl 5 program in conjunction with auxilliary 1373 tools which provides a real load-balancing for 1374 DNS.</p> 1375 </li> 1376 1377 <li> 1378 <strong>Proxy Throughput Round-Robin</strong> 1379 1380 <p>In this variant we use mod_rewrite and its proxy 1381 throughput feature. First we dedicate 1382 <code>www0.foo.com</code> to be actually 1383 <code>www.foo.com</code> by using a single</p> 1384 1385 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1386 cellpadding="5"> 1387 <tr> 1388 <td> 1389<pre> 1390www IN CNAME www0.foo.com. 1391</pre> 1392 </td> 1393 </tr> 1394 </table> 1395 1396 <p>entry in the DNS. Then we convert 1397 <code>www0.foo.com</code> to a proxy-only server, 1398 i.e. we configure this machine so all arriving URLs 1399 are just pushed through the internal proxy to one of 1400 the 5 other servers (<code>www1-www5</code>). To 1401 accomplish this we first establish a ruleset which 1402 contacts a load balancing script <code>lb.pl</code> 1403 for all URLs.</p> 1404 1405 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1406 cellpadding="5"> 1407 <tr> 1408 <td> 1409<pre> 1410RewriteEngine on 1411RewriteMap lb prg:/path/to/lb.pl 1412RewriteRule ^/(.+)$ ${lb:$1} [P,L] 1413</pre> 1414 </td> 1415 </tr> 1416 </table> 1417 1418 <p>Then we write <code>lb.pl</code>:</p> 1419 1420 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1421 cellpadding="5"> 1422 <tr> 1423 <td> 1424<pre> 1425#!/path/to/perl 1426## 1427## lb.pl -- load balancing script 1428## 1429 1430$| = 1; 1431 1432$name = "www"; # the hostname base 1433$first = 1; # the first server (not 0 here, because 0 is myself) 1434$last = 5; # the last server in the round-robin 1435$domain = "foo.dom"; # the domainname 1436 1437$cnt = 0; 1438while (<STDIN>) { 1439 $cnt = (($cnt+1) % ($last+1-$first)); 1440 $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain); 1441 print "http://$server/$_"; 1442} 1443 1444##EOF## 1445</pre> 1446 </td> 1447 </tr> 1448 </table> 1449 1450 <p>A last notice: Why is this useful? Seems like 1451 <code>www0.foo.com</code> still is overloaded? The 1452 answer is yes, it is overloaded, but with plain proxy 1453 throughput requests, only! All SSI, CGI, ePerl, etc. 1454 processing is completely done on the other machines. 1455 This is the essential point.</p> 1456 </li> 1457 1458 <li> 1459 <strong>Hardware/TCP Round-Robin</strong> 1460 1461 <p>There is a hardware solution available, too. Cisco 1462 has a beast called LocalDirector which does a load 1463 balancing at the TCP/IP level. Actually this is some 1464 sort of a circuit level gateway in front of a 1465 webcluster. If you have enough money and really need 1466 a solution with high performance, use this one.</p> 1467 </li> 1468 </ol> 1469 </dd> 1470 </dl> 1471 1472 <h2>New MIME-type, New Service</h2> 1473 1474 <dl> 1475 <dt><strong>Description:</strong></dt> 1476 1477 <dd> 1478 On the net there are a lot of nifty CGI programs. But 1479 their usage is usually boring, so a lot of webmaster 1480 don't use them. Even Apache's Action handler feature for 1481 MIME-types is only appropriate when the CGI programs 1482 don't need special URLs (actually PATH_INFO and 1483 QUERY_STRINGS) as their input. First, let us configure a 1484 new file type with extension <code>.scgi</code> (for 1485 secure CGI) which will be processed by the popular 1486 <code>cgiwrap</code> program. The problem here is that 1487 for instance we use a Homogeneous URL Layout (see above) 1488 a file inside the user homedirs has the URL 1489 <code>/u/user/foo/bar.scgi</code>. But 1490 <code>cgiwrap</code> needs the URL in the form 1491 <code>/~user/foo/bar.scgi/</code>. The following rule 1492 solves the problem: 1493 1494 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1495 cellpadding="5"> 1496 <tr> 1497 <td> 1498<pre> 1499RewriteRule ^/[uge]/<strong>([^/]+)</strong>/\.www/(.+)\.scgi(.*) ... 1500... /internal/cgi/user/cgiwrap/~<strong>$1</strong>/$2.scgi$3 [NS,<strong>T=application/x-http-cgi</strong>] 1501</pre> 1502 </td> 1503 </tr> 1504 </table> 1505 1506 <p>Or assume we have some more nifty programs: 1507 <code>wwwlog</code> (which displays the 1508 <code>access.log</code> for a URL subtree and 1509 <code>wwwidx</code> (which runs Glimpse on a URL 1510 subtree). We have to provide the URL area to these 1511 programs so they know on which area they have to act on. 1512 But usually this ugly, because they are all the times 1513 still requested from that areas, i.e. typically we would 1514 run the <code>swwidx</code> program from within 1515 <code>/u/user/foo/</code> via hyperlink to</p> 1516<pre> 1517/internal/cgi/user/swwidx?i=/u/user/foo/ 1518</pre> 1519 1520 <p>which is ugly. Because we have to hard-code 1521 <strong>both</strong> the location of the area 1522 <strong>and</strong> the location of the CGI inside the 1523 hyperlink. When we have to reorganise or area, we spend a 1524 lot of time changing the various hyperlinks.</p> 1525 </dd> 1526 1527 <dt><strong>Solution:</strong></dt> 1528 1529 <dd> 1530 The solution here is to provide a special new URL format 1531 which automatically leads to the proper CGI invocation. 1532 We configure the following: 1533 1534 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1535 cellpadding="5"> 1536 <tr> 1537 <td> 1538<pre> 1539RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/wwwidx?i=/$1/$2$3/ 1540RewriteRule ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3 1541</pre> 1542 </td> 1543 </tr> 1544 </table> 1545 1546 <p>Now the hyperlink to search at 1547 <code>/u/user/foo/</code> reads only</p> 1548<pre> 1549HREF="*" 1550</pre> 1551 1552 <p>which internally gets automatically transformed to</p> 1553<pre> 1554/internal/cgi/user/wwwidx?i=/u/user/foo/ 1555</pre> 1556 1557 <p>The same approach leads to an invocation for the 1558 access log CGI program when the hyperlink 1559 <code>:log</code> gets used.</p> 1560 </dd> 1561 </dl> 1562 1563 <h2>From Static to Dynamic</h2> 1564 1565 <dl> 1566 <dt><strong>Description:</strong></dt> 1567 1568 <dd>How can we transform a static page 1569 <code>foo.html</code> into a dynamic variant 1570 <code>foo.cgi</code> in a seamless way, i.e. without notice 1571 by the browser/user.</dd> 1572 1573 <dt><strong>Solution:</strong></dt> 1574 1575 <dd> 1576 We just rewrite the URL to the CGI-script and force the 1577 correct MIME-type so it gets really run as a CGI-script. 1578 This way a request to <code>/~quux/foo.html</code> 1579 internally leads to the invokation of 1580 <code>/~quux/foo.cgi</code>. 1581 1582 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1583 cellpadding="5"> 1584 <tr> 1585 <td> 1586<pre> 1587RewriteEngine on 1588RewriteBase /~quux/ 1589RewriteRule ^foo\.<strong>html</strong>$ foo.<strong>cgi</strong> [T=<strong>application/x-httpd-cgi</strong>] 1590</pre> 1591 </td> 1592 </tr> 1593 </table> 1594 </dd> 1595 </dl> 1596 1597 <h2>On-the-fly Content-Regeneration</h2> 1598 1599 <dl> 1600 <dt><strong>Description:</strong></dt> 1601 1602 <dd>Here comes a really esoteric feature: Dynamically 1603 generated but statically served pages, i.e. pages should be 1604 delivered as pure static pages (read from the filesystem 1605 and just passed through), but they have to be generated 1606 dynamically by the webserver if missing. This way you can 1607 have CGI-generated pages which are statically served unless 1608 one (or a cronjob) removes the static contents. Then the 1609 contents gets refreshed.</dd> 1610 1611 <dt><strong>Solution:</strong></dt> 1612 1613 <dd> 1614 This is done via the following ruleset: 1615 1616 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1617 cellpadding="5"> 1618 <tr> 1619 <td> 1620<pre> 1621RewriteCond %{REQUEST_FILENAME} <strong>!-s</strong> 1622RewriteRule ^page\.<strong>html</strong>$ page.<strong>cgi</strong> [T=application/x-httpd-cgi,L] 1623</pre> 1624 </td> 1625 </tr> 1626 </table> 1627 1628 <p>Here a request to <code>page.html</code> leads to a 1629 internal run of a corresponding <code>page.cgi</code> if 1630 <code>page.html</code> is still missing or has filesize 1631 null. The trick here is that <code>page.cgi</code> is a 1632 usual CGI script which (additionally to its STDOUT) 1633 writes its output to the file <code>page.html</code>. 1634 Once it was run, the server sends out the data of 1635 <code>page.html</code>. When the webmaster wants to force 1636 a refresh the contents, he just removes 1637 <code>page.html</code> (usually done by a cronjob).</p> 1638 </dd> 1639 </dl> 1640 1641 <h2>Document With Autorefresh</h2> 1642 1643 <dl> 1644 <dt><strong>Description:</strong></dt> 1645 1646 <dd>Wouldn't it be nice while creating a complex webpage if 1647 the webbrowser would automatically refresh the page every 1648 time we write a new version from within our editor? 1649 Impossible?</dd> 1650 1651 <dt><strong>Solution:</strong></dt> 1652 1653 <dd> 1654 No! We just combine the MIME multipart feature, the 1655 webserver NPH feature and the URL manipulation power of 1656 mod_rewrite. First, we establish a new URL feature: 1657 Adding just <code>:refresh</code> to any URL causes this 1658 to be refreshed every time it gets updated on the 1659 filesystem. 1660 1661 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1662 cellpadding="5"> 1663 <tr> 1664 <td> 1665<pre> 1666RewriteRule ^(/[uge]/[^/]+/?.*):refresh /internal/cgi/apache/nph-refresh?f=$1 1667</pre> 1668 </td> 1669 </tr> 1670 </table> 1671 1672 <p>Now when we reference the URL</p> 1673<pre> 1674/u/foo/bar/page.html:refresh 1675</pre> 1676 1677 <p>this leads to the internal invocation of the URL</p> 1678<pre> 1679/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html 1680</pre> 1681 1682 <p>The only missing part is the NPH-CGI script. Although 1683 one would usually say "left as an exercise to the reader" 1684 ;-) I will provide this, too.</p> 1685<pre> 1686#!/sw/bin/perl 1687## 1688## nph-refresh -- NPH/CGI script for auto refreshing pages 1689## Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved. 1690## 1691$| = 1; 1692 1693# split the QUERY_STRING variable 1694@pairs = split(/&/, $ENV{'QUERY_STRING'}); 1695foreach $pair (@pairs) { 1696 ($name, $value) = split(/=/, $pair); 1697 $name =~ tr/A-Z/a-z/; 1698 $name = 'QS_' . $name; 1699 $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; 1700 eval "\$$name = \"$value\""; 1701} 1702$QS_s = 1 if ($QS_s eq ''); 1703$QS_n = 3600 if ($QS_n eq ''); 1704if ($QS_f eq '') { 1705 print "HTTP/1.0 200 OK\n"; 1706 print "Content-type: text/html\n\n"; 1707 print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n"; 1708 exit(0); 1709} 1710if (! -f $QS_f) { 1711 print "HTTP/1.0 200 OK\n"; 1712 print "Content-type: text/html\n\n"; 1713 print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n"; 1714 exit(0); 1715} 1716 1717sub print_http_headers_multipart_begin { 1718 print "HTTP/1.0 200 OK\n"; 1719 $bound = "ThisRandomString12345"; 1720 print "Content-type: multipart/x-mixed-replace;boundary=$bound\n"; 1721 &print_http_headers_multipart_next; 1722} 1723 1724sub print_http_headers_multipart_next { 1725 print "\n--$bound\n"; 1726} 1727 1728sub print_http_headers_multipart_end { 1729 print "\n--$bound--\n"; 1730} 1731 1732sub displayhtml { 1733 local($buffer) = @_; 1734 $len = length($buffer); 1735 print "Content-type: text/html\n"; 1736 print "Content-length: $len\n\n"; 1737 print $buffer; 1738} 1739 1740sub readfile { 1741 local($file) = @_; 1742 local(*FP, $size, $buffer, $bytes); 1743 ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file); 1744 $size = sprintf("%d", $size); 1745 open(FP, "&lt;$file"); 1746 $bytes = sysread(FP, $buffer, $size); 1747 close(FP); 1748 return $buffer; 1749} 1750 1751$buffer = &readfile($QS_f); 1752&print_http_headers_multipart_begin; 1753&displayhtml($buffer); 1754 1755sub mystat { 1756 local($file) = $_[0]; 1757 local($time); 1758 1759 ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file); 1760 return $mtime; 1761} 1762 1763$mtimeL = &mystat($QS_f); 1764$mtime = $mtime; 1765for ($n = 0; $n &lt; $QS_n; $n++) { 1766 while (1) { 1767 $mtime = &mystat($QS_f); 1768 if ($mtime ne $mtimeL) { 1769 $mtimeL = $mtime; 1770 sleep(2); 1771 $buffer = &readfile($QS_f); 1772 &print_http_headers_multipart_next; 1773 &displayhtml($buffer); 1774 sleep(5); 1775 $mtimeL = &mystat($QS_f); 1776 last; 1777 } 1778 sleep($QS_s); 1779 } 1780} 1781 1782&print_http_headers_multipart_end; 1783 1784exit(0); 1785 1786##EOF## 1787</pre> 1788 </dd> 1789 </dl> 1790 1791 <h2>Mass Virtual Hosting</h2> 1792 1793 <dl> 1794 <dt><strong>Description:</strong></dt> 1795 1796 <dd>The <code><VirtualHost></code> feature of Apache 1797 is nice and works great when you just have a few dozens 1798 virtual hosts. But when you are an ISP and have hundreds of 1799 virtual hosts to provide this feature is not the best 1800 choice.</dd> 1801 1802 <dt><strong>Solution:</strong></dt> 1803 1804 <dd> 1805 To provide this feature we map the remote webpage or even 1806 the complete remote webarea to our namespace by the use 1807 of the <i>Proxy Throughput</i> feature (flag [P]): 1808 1809 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1810 cellpadding="5"> 1811 <tr> 1812 <td> 1813<pre> 1814## 1815## vhost.map 1816## 1817www.vhost1.dom:80 /path/to/docroot/vhost1 1818www.vhost2.dom:80 /path/to/docroot/vhost2 1819 : 1820www.vhostN.dom:80 /path/to/docroot/vhostN 1821</pre> 1822 </td> 1823 </tr> 1824 </table> 1825 1826 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1827 cellpadding="5"> 1828 <tr> 1829 <td> 1830<pre> 1831## 1832## httpd.conf 1833## 1834 : 1835# use the canonical hostname on redirects, etc. 1836UseCanonicalName on 1837 1838 : 1839# add the virtual host in front of the CLF-format 1840CustomLog /path/to/access_log "%{VHOST}e %h %l %u %t \"%r\" %>s %b" 1841 : 1842 1843# enable the rewriting engine in the main server 1844RewriteEngine on 1845 1846# define two maps: one for fixing the URL and one which defines 1847# the available virtual hosts with their corresponding 1848# DocumentRoot. 1849RewriteMap lowercase int:tolower 1850RewriteMap vhost txt:/path/to/vhost.map 1851 1852# Now do the actual virtual host mapping 1853# via a huge and complicated single rule: 1854# 1855# 1. make sure we don't map for common locations 1856RewriteCond %{REQUEST_URI} !^/commonurl1/.* 1857RewriteCond %{REQUEST_URI} !^/commonurl2/.* 1858 : 1859RewriteCond %{REQUEST_URI} !^/commonurlN/.* 1860# 1861# 2. make sure we have a Host header, because 1862# currently our approach only supports 1863# virtual hosting through this header 1864RewriteCond %{HTTP_HOST} !^$ 1865# 1866# 3. lowercase the hostname 1867RewriteCond ${lowercase:%{HTTP_HOST}|NONE} ^(.+)$ 1868# 1869# 4. lookup this hostname in vhost.map and 1870# remember it only when it is a path 1871# (and not "NONE" from above) 1872RewriteCond ${vhost:%1} ^(/.*)$ 1873# 1874# 5. finally we can map the URL to its docroot location 1875# and remember the virtual host for logging puposes 1876RewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}] 1877 : 1878</pre> 1879 </td> 1880 </tr> 1881 </table> 1882 </dd> 1883 </dl> 1884 1885 <h1>Access Restriction</h1> 1886 1887 <h2>Blocking of Robots</h2> 1888 1889 <dl> 1890 <dt><strong>Description:</strong></dt> 1891 1892 <dd>How can we block a really annoying robot from 1893 retrieving pages of a specific webarea? A 1894 <code>/robots.txt</code> file containing entries of the 1895 "Robot Exclusion Protocol" is typically not enough to get 1896 rid of such a robot.</dd> 1897 1898 <dt><strong>Solution:</strong></dt> 1899 1900 <dd> 1901 We use a ruleset which forbids the URLs of the webarea 1902 <code>/~quux/foo/arc/</code> (perhaps a very deep 1903 directory indexed area where the robot traversal would 1904 create big server load). We have to make sure that we 1905 forbid access only to the particular robot, i.e. just 1906 forbidding the host where the robot runs is not enough. 1907 This would block users from this host, too. We accomplish 1908 this by also matching the User-Agent HTTP header 1909 information. 1910 1911 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1912 cellpadding="5"> 1913 <tr> 1914 <td> 1915<pre> 1916RewriteCond %{HTTP_USER_AGENT} ^<strong>NameOfBadRobot</strong>.* 1917RewriteCond %{REMOTE_ADDR} ^<strong>123\.45\.67\.[8-9]</strong>$ 1918RewriteRule ^<strong>/~quux/foo/arc/</strong>.+ - [<strong>F</strong>] 1919</pre> 1920 </td> 1921 </tr> 1922 </table> 1923 </dd> 1924 </dl> 1925 1926 <h2>Blocked Inline-Images</h2> 1927 1928 <dl> 1929 <dt><strong>Description:</strong></dt> 1930 1931 <dd>Assume we have under http://www.quux-corp.de/~quux/ 1932 some pages with inlined GIF graphics. These graphics are 1933 nice, so others directly incorporate them via hyperlinks to 1934 their pages. We don't like this practice because it adds 1935 useless traffic to our server.</dd> 1936 1937 <dt><strong>Solution:</strong></dt> 1938 1939 <dd> 1940 While we cannot 100% protect the images from inclusion, 1941 we can at least restrict the cases where the browser 1942 sends a HTTP Referer header. 1943 1944 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1945 cellpadding="5"> 1946 <tr> 1947 <td> 1948<pre> 1949RewriteCond %{HTTP_REFERER} <strong>!^$</strong> 1950RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC] 1951RewriteRule <strong>.*\.gif$</strong> - [F] 1952</pre> 1953 </td> 1954 </tr> 1955 </table> 1956 1957 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1958 cellpadding="5"> 1959 <tr> 1960 <td> 1961<pre> 1962RewriteCond %{HTTP_REFERER} !^$ 1963RewriteCond %{HTTP_REFERER} !.*/foo-with-gif\.html$ 1964RewriteRule <strong>^inlined-in-foo\.gif$</strong> - [F] 1965</pre> 1966 </td> 1967 </tr> 1968 </table> 1969 </dd> 1970 </dl> 1971 1972 <h2>Host Deny</h2> 1973 1974 <dl> 1975 <dt><strong>Description:</strong></dt> 1976 1977 <dd>How can we forbid a list of externally configured hosts 1978 from using our server?</dd> 1979 1980 <dt><strong>Solution:</strong></dt> 1981 1982 <dd> 1983 For Apache >= 1.3b6: 1984 1985 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 1986 cellpadding="5"> 1987 <tr> 1988 <td> 1989<pre> 1990RewriteEngine on 1991RewriteMap hosts-deny txt:/path/to/hosts.deny 1992RewriteCond ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR] 1993RewriteCond ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND 1994RewriteRule ^/.* - [F] 1995</pre> 1996 </td> 1997 </tr> 1998 </table> 1999 2000 <p>For Apache <= 1.3b6:</p> 2001 2002 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2003 cellpadding="5"> 2004 <tr> 2005 <td> 2006<pre> 2007RewriteEngine on 2008RewriteMap hosts-deny txt:/path/to/hosts.deny 2009RewriteRule ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1 2010RewriteRule !^NOT-FOUND/.* - [F] 2011RewriteRule ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1 2012RewriteRule !^NOT-FOUND/.* - [F] 2013RewriteRule ^NOT-FOUND/(.*)$ /$1 2014</pre> 2015 </td> 2016 </tr> 2017 </table> 2018 2019 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2020 cellpadding="5"> 2021 <tr> 2022 <td> 2023<pre> 2024## 2025## hosts.deny 2026## 2027## ATTENTION! This is a map, not a list, even when we treat it as such. 2028## mod_rewrite parses it for key/value pairs, so at least a 2029## dummy value "-" must be present for each entry. 2030## 2031 2032193.102.180.41 - 2033bsdti1.sdm.de - 2034192.76.162.40 - 2035</pre> 2036 </td> 2037 </tr> 2038 </table> 2039 </dd> 2040 </dl> 2041 2042 <h2>URL-Restricted Proxy</h2> 2043 2044 <dl> 2045 <dt><strong>Description:</strong></dt> 2046 2047 <dd>How can we restrict the proxy to allow access to a 2048 configurable set of internet sites only? The site list is 2049 extracted from a prepared bookmarks file.</dd> 2050 2051 <dt><strong>Solution:</strong></dt> 2052 2053 <dd> 2054 We first have to make sure mod_rewrite is below(!) 2055 mod_proxy in the <code>Configuration</code> file when 2056 compiling the Apache webserver (or in the 2057 <code>AddModule</code> list of <code>httpd.conf</code> in 2058 the case of dynamically loaded modules), as it must get 2059 called <em>_before_</em> mod_proxy. 2060 2061 <p>For simplicity, we generate the site list as a 2062 textfile map (but see the <a 2063 href="../mod/mod_rewrite.html#RewriteMap">mod_rewrite 2064 documentation</a> for a conversion script to DBM format). 2065 A typical Netscape bookmarks file can be converted to a 2066 list of sites with a shell script like this:</p> 2067 2068 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2069 cellpadding="5"> 2070 <tr> 2071 <td> 2072<pre> 2073#!/bin/sh 2074cat ${1:-~/.netscape/bookmarks.html} | 2075tr -d '\015' | tr '[A-Z]' '[a-z]' | grep href=\" | 2076sed -e '/href="file:/d;' -e '/href="news:/d;' \ 2077 -e 's|^.*href="[^:]*://\([^:/"]*\).*$|\1 OK|;' \ 2078 -e '/href="/s|^.*href="\([^:/"]*\).*$|\1 OK|;' | 2079sort -u 2080</pre> 2081 </td> 2082 </tr> 2083 </table> 2084 2085 <p>We redirect the resulting output into a text file 2086 called <code>goodsites.txt</code>. It now looks similar 2087 to this:</p> 2088 2089 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2090 cellpadding="5"> 2091 <tr> 2092 <td> 2093<pre> 2094www.apache.org OK 2095xml.apache.org OK 2096jakarta.apache.org OK 2097perl.apache.org OK 2098... 2099</pre> 2100 </td> 2101 </tr> 2102 </table> 2103 2104 <p>We reference this site file within the configuration 2105 for the <code>VirtualHost</code> which is responsible for 2106 serving as a proxy (often not port 80, but 81, 8080 or 2107 8008).</p> 2108 2109 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2110 cellpadding="5"> 2111 <tr> 2112 <td> 2113<pre> 2114<VirtualHost *:8008> 2115 ... 2116 RewriteEngine On 2117 # Either use the (plaintext) allow list from goodsites.txt 2118 RewriteMap ProxyAllow txt:/usr/local/apache/conf/goodsites.txt 2119 # Or, for faster access, convert it to a DBM database: 2120 #RewriteMap ProxyAllow dbm:/usr/local/apache/conf/goodsites 2121 # Match lowercased hostnames 2122 RewriteMap lowercase int:tolower 2123 # Here we go: 2124 # 1) first lowercase the site name and strip off a :port suffix 2125 RewriteCond ${lowercase:%{HTTP_HOST}} ^([^:]*).*$ 2126 # 2) next look it up in the map file. 2127 # "%1" refers to the previous regex. 2128 # If the result is "OK", proxy access is granted. 2129 RewriteCond ${ProxyAllow:%1|DENY} !^OK$ [NC] 2130 # 3) Disallow proxy requests if the site was _not_ tagged "OK": 2131 RewriteRule ^proxy: - [F] 2132 ... 2133</VirtualHost> 2134</pre> 2135 </td> 2136 </tr> 2137 </table> 2138 </dd> 2139 </dl> 2140 2141 <h2>Proxy Deny</h2> 2142 2143 <dl> 2144 <dt><strong>Description:</strong></dt> 2145 2146 <dd>How can we forbid a certain host or even a user of a 2147 special host from using the Apache proxy?</dd> 2148 2149 <dt><strong>Solution:</strong></dt> 2150 2151 <dd> 2152 We first have to make sure mod_rewrite is below(!) 2153 mod_proxy in the <code>Configuration</code> file when 2154 compiling the Apache webserver. This way it gets called 2155 <em>_before_</em> mod_proxy. Then we configure the 2156 following for a host-dependend deny... 2157 2158 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2159 cellpadding="5"> 2160 <tr> 2161 <td> 2162<pre> 2163RewriteCond %{REMOTE_HOST} <strong>^badhost\.mydomain\.com$</strong> 2164RewriteRule !^http://[^/.]\.mydomain.com.* - [F] 2165</pre> 2166 </td> 2167 </tr> 2168 </table> 2169 2170 <p>...and this one for a user@host-dependend deny:</p> 2171 2172 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2173 cellpadding="5"> 2174 <tr> 2175 <td> 2176<pre> 2177RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>^badguy@badhost\.mydomain\.com$</strong> 2178RewriteRule !^http://[^/.]\.mydomain.com.* - [F] 2179</pre> 2180 </td> 2181 </tr> 2182 </table> 2183 </dd> 2184 </dl> 2185 2186 <h2>Special Authentication Variant</h2> 2187 2188 <dl> 2189 <dt><strong>Description:</strong></dt> 2190 2191 <dd>Sometimes a very special authentication is needed, for 2192 instance a authentication which checks for a set of 2193 explicitly configured users. Only these should receive 2194 access and without explicit prompting (which would occur 2195 when using the Basic Auth via mod_access).</dd> 2196 2197 <dt><strong>Solution:</strong></dt> 2198 2199 <dd> 2200 We use a list of rewrite conditions to exclude all except 2201 our friends: 2202 2203 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2204 cellpadding="5"> 2205 <tr> 2206 <td> 2207<pre> 2208RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend1@client1.quux-corp\.com$</strong> 2209RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend2</strong>@client2.quux-corp\.com$ 2210RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} <strong>!^friend3</strong>@client3.quux-corp\.com$ 2211RewriteRule ^/~quux/only-for-friends/ - [F] 2212</pre> 2213 </td> 2214 </tr> 2215 </table> 2216 </dd> 2217 </dl> 2218 2219 <h2>Referer-based Deflector</h2> 2220 2221 <dl> 2222 <dt><strong>Description:</strong></dt> 2223 2224 <dd>How can we program a flexible URL Deflector which acts 2225 on the "Referer" HTTP header and can be configured with as 2226 many referring pages as we like?</dd> 2227 2228 <dt><strong>Solution:</strong></dt> 2229 2230 <dd> 2231 Use the following really tricky ruleset... 2232 2233 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2234 cellpadding="5"> 2235 <tr> 2236 <td> 2237<pre> 2238RewriteMap deflector txt:/path/to/deflector.map 2239 2240RewriteCond %{HTTP_REFERER} !="" 2241RewriteCond ${deflector:%{HTTP_REFERER}} ^-$ 2242RewriteRule ^.* %{HTTP_REFERER} [R,L] 2243 2244RewriteCond %{HTTP_REFERER} !="" 2245RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND 2246RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L] 2247</pre> 2248 </td> 2249 </tr> 2250 </table> 2251 2252 <p>... in conjunction with a corresponding rewrite 2253 map:</p> 2254 2255 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2256 cellpadding="5"> 2257 <tr> 2258 <td> 2259<pre> 2260## 2261## deflector.map 2262## 2263 2264http://www.badguys.com/bad/index.html - 2265http://www.badguys.com/bad/index2.html - 2266http://www.badguys.com/bad/index3.html http://somewhere.com/ 2267</pre> 2268 </td> 2269 </tr> 2270 </table> 2271 2272 <p>This automatically redirects the request back to the 2273 referring page (when "-" is used as the value in the map) 2274 or to a specific URL (when an URL is specified in the map 2275 as the second argument).</p> 2276 </dd> 2277 </dl> 2278 2279 <h1>Other</h1> 2280 2281 <h2>External Rewriting Engine</h2> 2282 2283 <dl> 2284 <dt><strong>Description:</strong></dt> 2285 2286 <dd>A FAQ: How can we solve the FOO/BAR/QUUX/etc. problem? 2287 There seems no solution by the use of mod_rewrite...</dd> 2288 2289 <dt><strong>Solution:</strong></dt> 2290 2291 <dd> 2292 Use an external rewrite map, i.e. a program which acts 2293 like a rewrite map. It is run once on startup of Apache 2294 receives the requested URLs on STDIN and has to put the 2295 resulting (usually rewritten) URL on STDOUT (same 2296 order!). 2297 2298 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2299 cellpadding="5"> 2300 <tr> 2301 <td> 2302<pre> 2303RewriteEngine on 2304RewriteMap quux-map <strong>prg:</strong>/path/to/map.quux.pl 2305RewriteRule ^/~quux/(.*)$ /~quux/<strong>${quux-map:$1}</strong> 2306</pre> 2307 </td> 2308 </tr> 2309 </table> 2310 2311 <table bgcolor="#E0E5F5" border="0" cellspacing="0" 2312 cellpadding="5"> 2313 <tr> 2314 <td> 2315<pre> 2316#!/path/to/perl 2317 2318# disable buffered I/O which would lead 2319# to deadloops for the Apache server 2320$| = 1; 2321 2322# read URLs one per line from stdin and 2323# generate substitution URL on stdout 2324while (<>) { 2325 s|^foo/|bar/|; 2326 print $_; 2327} 2328</pre> 2329 </td> 2330 </tr> 2331 </table> 2332 2333 <p>This is a demonstration-only example and just rewrites 2334 all URLs <code>/~quux/foo/...</code> to 2335 <code>/~quux/bar/...</code>. Actually you can program 2336 whatever you like. But notice that while such maps can be 2337 <strong>used</strong> also by an average user, only the 2338 system administrator can <strong>define</strong> it.</p> 2339 </dd> 2340 </dl> 2341 <hr /> 2342 2343 <h3 align="CENTER">Apache HTTP Server Version 1.3</h3> 2344 <a href="./"><img src="../images/index.gif" alt="Index" /></a> 2345 <a href="../"><img src="../images/home.gif" alt="Home" /></a> 2346 2347 </blockquote> 2348 </body> 2349</html> 2350