1# This files contains examples and an explanation for the RULESFILE / RULE 2# feature. 3# 4# Rules for Lynx are experimental. They provide a rudimentary capability 5# for URL rejection and substitution based on string matching. 6# Most users and most installations will not need this feature, it is here 7# in case you find it useful. Note that this may change or go away in 8# future releases of Lynx; if you find it useful, consider describing your 9# use of it in a message to <lynx-dev@nongnu.org>. 10# 11# Syntax: 12# ======= 13# Summary of common forms: 14# 15# Fail URL1 16# Map URL1 URL2 [CONDITION] 17# Pass URL1 [URL2] [CONDITION] 18# Redirect URL1 URL2 [CONDITION] 19# RedirectPerm URL1 URL2 [CONDITION] 20# UseProxy URL1 PROXYURL [CONDITION] 21# UseProxy URL1 "none" [CONDITION] 22# 23# Alert URL1 MESSAGE [CONDITION] 24# AlwaysAlert URL1 MESSAGE [CONDITION] 25# UserMsg URL1 MESSAGE [CONDITION] 26# InfoMsg URL1 MESSAGE [CONDITION] 27# Progress URL1 MESSAGE [CONDITION] 28# 29# As you may have guessed, comments are introduced by a '#' character. 30# Rules have the general form 31# Operator Operand1 [Operand2] [CONDITION] 32# with words separated by whitespace. Words containing space can be quoted 33# with "double quotes". Although normally this should not be necessary 34# necessary for URLs, it has to be used for MESSAGE Operands in Alert etc. 35# See below for an explanation of the optional CONDITION. 36# 37# Recognized operators are 38# 39# Fail URL1 40# Reject access to this URL, stop processing further rules. 41# 42# Map URL1 URL2 43# Change the current URL to URL2, then continue processing. 44# 45# Pass URL1 [URL2] 46# Accept this URL and stop processing further rules; if URL2 47# is given, apply this as the last mapping. 48# See the next item for reasons why you generally don't want to "pass" 49# a changed URL. 50# 51# RedirectTemp URL1 URL2 52# RedirectPerm URL1 URL2 53# Redirect [STATUS] URL1 URL2 54# Stop processing further rules and redirect to URL2, just as if lynx had 55# received a HTTP redirection with URL2 as the new location. This means that 56# URL2 is subject to any applicable permission checking, if it passes a new 57# request will be issued (which may result in a new round of rules checking, 58# with a new "current URL") or the new URL might be taken from the cache, and, 59# after successful loading, lynx' idea of what the loaded document's URL is 60# will be fully updated. All this does not happen if you just "pass" a changed 61# URL (or let it fall through), so this is generally the preferred way for 62# substituting URLs. 63# If the RedirectPerm variant is used, or if the optional word is supplied and 64# is either "permanent" or "301", act as if lynx had received a permanent 65# redirection (with HTTP status 301). In most cases this will not make a 66# noticeable difference. Lynx may cache the location in a special way for 301 67# redirections, so that the redirection is followed immediately the next time 68# the same original URL is accessed, without re-checking of rules. Therefore 69# the permanent variant should never be used if the desired outcome of rules 70# processing depends on variable conditions (see CONDITIONS below) or on 71# setting a special flag (see next item). 72# 73# PermitRedirection URL1 74# Mark following redirection as permitted, and continue processing. Some 75# redirection locations are normally not allowed, because permitting them in a 76# response from an arbitrary remote server would open a security hole, and 77# others are not allowed if certain restrictions options are in effect. Among 78# redirection locations normally always forbidden are lynxprog: and lynxexec: 79# schemes. With "default" anonymous restrictions in effect, many URL schemes 80# are disallowed if the user would not be allowed to use them with 'g'oto. 81# This rule allows to override the permission checking if rules processing ends 82# with a Redirect (including the RedirectPerm or RedirectTemp forms). It is 83# ignored otherwise, in particular, it does not influence acceptance if rules 84# processing ends with a "Pass" and a real redirection is received in the 85# subsequent HTTP request. If redirections are chained, it only applies to the 86# redirection that ends the same rules cycle. Note that the new URL is still 87# subject to other permission checks that are not specific to redirections; but 88# using this rule may still weaken the expected effect of -anonymous, 89# -validate, -realm, and other restriction options, including TRUSTED_EXEC and 90# similar in lynx.cfg, so be careful where you redirect to if restrictions are 91# important! 92# 93# UseProxy URL1 PROXYURL 94# Stop processing further rules, and force access through the proxy given by 95# PROXYURL. PROXYURL should have the same form as required for foo_proxy 96# environment variables and lynx.cfg options, i.e., (unless you are trying to 97# do something unusual) "http://some.proxy-server.dom:port/". This rule 98# overrides any use of a proxy (or external gateway) that might otherwise apply 99# because of environment variables or lynx.cfg options, it also overrides any 100# "no_proxy" settings. 101# 102# UseProxy URL1 none 103# Mark request as NOT using any proxy (or external gateway), and continue 104# processing(!). For a request marked this way, any subsequent UseProxy 105# rule with a PROXYURL will be ignored, and any use of a proxy (or external 106# gateway) that might otherwise apply because of environment variables or 107# lynx.cfg options will be overridden. Note that the marking will not 108# survive a Redirect rule (since that will result, if successful, in a 109# new request). 110# 111# Alert URL1 MESSAGE 112# AlwaysAlert URL1 MESSAGE 113# UserMsg URL1 MESSAGE 114# InfoMsg URL1 MESSAGE 115# Progress URL1 MESSAGE 116# These produce various kinds of statusline messages, differing in whether 117# a pause is enforced and in its duration, immediately when the rule is 118# applied. AlwaysAlert shows the message text even in non-interactive mode 119# (-dump, -source, etc.). Rule processing continues after the message is 120# shown. As usual, these rules only apply if URL1 matches. MESSAGE is 121# the text to be displayed, it can contain one occurrence of "%s" which 122# will be replaced by the current URL, literal '%' characters should be 123# doubled as "%%". 124# 125# Rules are processed sequentially first to last for each request, a rule 126# applies if the current URL matches URL1. The current URL is initally the 127# URL for the resource the user is trying to access, but may change as the 128# result of applied Map rules. case-sensitive (!) string comparison is used, 129# in addition URL1 can contain one '*' which is interpreted as a wildcard 130# matching 0 or more characters. So if for example 131# "http://example.com/dir/doc.html" is requested, it would match any of 132# the following: 133# Pass http:* 134# Pass http://example.com/*.html 135# Pass http://example.com/* 136# Pass http://example* 137# Pass http://*/doc.html 138# but not: 139# Pass http://example/* 140# Pass http://Example.COM/dir/doc.html 141# Pass http://Example.COM/* 142# 143# If a URL2 is given and also contains a '*', that character will be 144# replaced by whatever matched in URL1. Processing stops with the 145# first matching "Fail" or "Pass" or when the end of the rules is reached. 146# If the end is reached without a "Fail" or "Pass", the URL is allowed 147# (equivalent to a final "Pass *"). 148# 149# The requested URL will have been transformed to Lynx' normal 150# representation. This means that local file resources should be 151# expected in the form "file://localhost/<path using slash separators>", 152# not in the machine's native representation for filenames. 153# 154# Anyone with experience configuring the venerable CERN httpd server will 155# recognize some of the syntax - in fact, the code implementing rules goes 156# back to a common ancestor. But note the differences: all URLs and URL- 157# patterns here have to be given as absolute URLs, even for local files. 158# (Absolute URLs don't imply proxying.) 159# 160# CONDITIONS 161# ---------- 162# All rules mentioned can be followed by an optional CONDITION, which can 163# be used to further restrict when the rule should be applied (in addition 164# to the match on URL1). A CONDITION takes one of the forms 165# "if" CONDITIONFLAG 166# "unless" CONDITIONFLAG 167# and currently two condition flags are recognized: 168# "userspecified" (or abbreviated "userspec") 169# "redirected" 170# To explain these, first some terms need to be defined. A "request" 171# is... 172# 173# A user action (like following a link, or entering a 'g'oto URL) can either be 174# rejected immediately (for example, because of restrictions in effect, or 175# because of invalid input), or can generate a "request". For the purpose of 176# this discussion, a "request" is the sequence of processing done by lynx, 177# which might ultimately lead to an actual network request and loading and 178# display of data; a request can also result in rejection (for example, some 179# restrictions are checked at this stage), or in a redirection. A redirection 180# in turn can be rejected (which makes the request fail), or can automatically 181# generate a new request. A "request chain" is the sequence of one or more 182# requests triggered by the same user event that are chained together by 183# redirections. 184# For each request, some URL schemes are handled (or rejected) specially, see 185# Limitation 1 below, the others are passed to the generic access code. Rules 186# processing occurs at the beginning of the generic access code, before a 187# request is dispatched to the scheme-specific protocol module (but after 188# checking whether the request can be satisfied by re-displaying an already 189# cached document). 190# With these definitions, the meaning of the possible CONDITIONFLAGS: 191# 192# if redirected 193# The rule applies if the current request results from a redirection; 194# whether that was a real HTTP redirection or one generated by a rule 195# in the previous request makes no difference. In other words, the 196# condition is true if the current request is not the first one in the 197# request chain. 198# 199# if userspecified 200# The rule applies if the initial URL of the request chain was specified 201# by the user. Lynx marks a request as "user specified" for URLs that 202# come from 'g'oto prompts, as well as for following links in a bookmark 203# or Jump file and some other special (lynx-generated) pages that may 204# contain URLs that were typed in by the user. 205# Note that this is not a property of the request, but of the whole request 206# chain (based on where the first request's URL came from). The current 207# URL may differ from what the user typed 208# - because of initial fixups, including conversion of Guess-URLs and file 209# paths to full URLs, 210# - because of Map rules applied, and/or 211# - because of a previous redirection. 212# So to make reasonably sure a suspicious or potentially dangerous URL has 213# been entered by the user, i.e. is not a link or external redirection 214# location that cannot be trusted, a combination of "userspecified" and 215# "redirected" flags should be used, for example 216# Fail URL1 unless userspecified 217# Fail URL1 if redirected 218# ... 219# 220# CAVEAT 221# ====== 222# First, to squash any false expectations, an example for what NOT TO DO. 223# It might be expected that a rule like 224# Fail file://localhost/etc/passwd # <- DON'T RELY ON THIS 225# could be used to prevent access to the file "/etc/passwd". This might 226# fool a naive user, but the more sophisticated user could still gain 227# access, by experimenting with other forms like (@@@ untested) 228# "file://<machine's domain name>/etc/passwd" or "/etc//passwd" 229# or "/etc/p%61asswd" or "/etc/passwd?" or "/etc/passwd#X" and so on. 230# There are many URL forms for accessing the same resource, and Lynx 231# just doesn't guarantee that URLs for the same resource will look the 232# same way. 233# 234# The same reservation applies to any attempts to block access to unwanted 235# sites and so on. This isn't the right place for implementing it. 236# (Lynx has a number of mechanisms documented elsewhere to restrict access, 237# see the INSTALLATION file, lynx.cfg, lynx -help, lynx -restrictions.) 238# 239# Some more useful applications: 240# 241# 1. Disabling URLs by access scheme 242# ---------------------------------- 243# Fail gopher:* 244# Fail finger:* 245# Fail lynxcgi:* 246# Fail LYNXIMGMAP:* 247# This should work (but no guarantees) because Lynx canonicalizes 248# the case of recognized access schemes and does not interpret 249# %-escaping in the scheme part (@@@ always?) 250# 251# Note that for many access schemes Lynx already has mechanisms to 252# restrict access (see lynx.cfg, -help, -restrictions, etc.), others 253# have to be specifically enabled. Those mechanisms should be used 254# in preference. 255# Note especially Limitation 1 below. 256# This can be used for the remaining cases, or in addition by the 257# more paranoid. Note that disabling "file:*" will also make many 258# of the special pages generated by lynx as temporary files (INFO, 259# history, ...) inaccessible, on the other hand it doesn't prevent 260# _writing_ of various temp files - probably not what you want. 261# 262# You could also direct access for a scheme to a brief text explaining 263# why it's not available: 264# Redirect news:* http://localhost/texts/newsserver-is-broken.html 265# 266# 2. Preventing accidental access 267# ------------------------------- 268# If there is a page or site you don't want to access for whatever 269# reason (say there's a link to it that crashes Lynx [don't forget to 270# report a bug], or if that starts sending you a 5 Mb file you don't 271# want, or you just don't like the people...), you can prevent yourself 272# from accidentally accessing it: 273# Fail http://bad.site.com/* 274# 275# 3. Compressed files 276# ------------------- 277# You have downloaded a bunch of HTML documents, and compressed them 278# to save space. Then you discover that links between the files don't 279# work, because they all use the names of the uncompressed files. The 280# following kind of rule will alow you to navigate, invisibly accessing 281# the compressed files: 282# Map file://localhost/somedir/*.html file://localhost/somedir/*.html.gz 283# or, perhaps better: 284# Redirect file://localhost/somedir/*.html file://localhost/somedir/*.html.gz 285# 286# 4. Use local copies 287# ------------------- 288# You have downloaded a tree of HTML documents, but there are many links 289# between them that still point to the remote location. You want to access 290# the local copies instead, after all that's why you downloaded them. You 291# could start editing the HTML, but the following might be simpler: 292# Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html 293# Or even combine this with compressing the files: 294# Map http://remote.com/docs/*.html file://localhost/home/me/docs/*.html.gz 295# 296# Again, replacing the "Map" with "Redirect" is probably better - it will 297# allow you to see the _real_ location on the lynx INFO screen or in the 298# HISTORY list, will avoid duplicates in the cache if the same document is 299# loaded with two different URLs, and may allow you to 'e'dit the local 300# from within lynx if you feel like it. 301# 302# 5. Broken links etc. 303# -------------------- 304# A user has moved from http://www.siteA.com/~jdoe to http://siteB.org/john, 305# or http://www.provider.com/company/ has moved to their own server 306# http://www.company.com, but there are still links to the old location 307# all over the place; they now are broken or lead to a stupid "this page 308# has moved, please update your bookmarks. Refresh in 5 seconds" page 309# which you're tired of seeing. This will not fix your bookmarks, and 310# it will let you see the outdated URLs for longer (Limitation 3 below), 311# but for a quick fix: 312# Redirect http://www.siteA.com/~jdoe/* http://siteB.org/john/* 313# Redirect http://www.provider.com/company/* http://www.company.com/* 314# 315# You could use "Map" instead of "Redirect", but this would let you see the 316# outdated URLs for longer and even bookmark them, and you are likely to 317# create invalid links if not all documents from a site are mapped 318# (Limitation 3). 319# 320# 6. DNS troubles 321# --------------- 322# A special case of broken links. If a site is inaccessible because the 323# name cannot be resolved (your or their name server is broken, or the 324# name registry once again made a mistake, or they really didn't pay in 325# time...) but you still somehow know the address; or if name lookups are 326# just too slow: 327# Map http://www.somesite.com/* http://10.1.2.3/* 328# (You could do the equivalent more cleanly by adding an entry to the hosts 329# file, if you have access to it.) 330# 331# Or, if a name resolves to several addresses of which one is down, and the 332# DNS hasn't caught up: 333# Map http://www.w3.org/* http://www12.w3.org/* 334# 335# Note that this can break access to some name-based virtually hosted sites. 336# 337# In this case use of "Map" is probably preferred over "Redirect", as long 338# as the URL on the left side contains the real and preferred hostname or 339# the problem is only temporary. 340# 341# 7. Avoid redirections 342# --------------------- 343# Some sites have a habit to provide links that don't go to the destination 344# directly but always force redirection via some intermediate URL. The 345# delay imposed by this, especially for users with slower connections and 346# for overloaded servers, can be avoided if the intermediate URLs always 347# follow some simple pattern: we can then anticipate the redirect that will 348# inevitably follow and generate it internally. For example, 349# Redirect http://lwn.net/cgi-bin/vr/* http://* 350# 351# Warning: The page authors may not like this circumvention. Often the 352# redirection is wanted by them to track access, sometimes in connection 353# with cookies. Some sites may employ mechanisms that defeat the shortcut. 354# It is your responsibility to decide whether use of this feature is 355# acceptable. (But note that the same effect can be achieved anyway for 356# any link by editing the URL, e.g. with the ELGOTO ('E') key in Lynx, so 357# a shortcut like this does not create some new kind of intrusion.) 358# 359# 8. Detailed proxy selection 360# --------------------------- 361# Basic use for this one should be obvious, if you have a need for it. 362# It simply allows selecting use (or non-use) of proxies on a more detailed 363# level than the traditional <scheme>_proxy and no_proxy variables, as well 364# as using different proxies for different sites. 365# For example, to request access through an anonymizing proxy for all pages 366# on a "suspicious" site: 367# UseProxy http://suspicious.site/* http://anonymyzing.proxy.dom/ 368# (as long as all URLs really have a matching form, not some alternative 369# like <http://suspicious.site:80/> or <http://SuSpIcIoUs.site/>!) 370# 371# To access some site through a local squid proxy, running on the same host 372# as lynx, except for some image types (say because you rarely access images 373# with lynx anyway, and if you do, you don't want them cached by the proxy): 374# UseProxy http://some.site/*.gif none 375# UseProxy http://some.site/*.jpg none 376# UseProxy http://some.site/* http://localhost:3128/ 377# Note that order is important here. 378# 379# To exempt a local address from all proxying: 380# UseProxy http://local.site/* none 381# 382# Note however that for some purposes the "no_proxy" setting may be better 383# suited than "UseProxy ... none", because of its different matching logic 384# (see comments in lynx.cfg). 385# 386# 9. Invent your own scheme 387# ------------------------- 388# Suppose you want to teach lynx to handle a completely new URL scheme. 389# If what's required for the new scheme is already available in lynx in 390# _some_ way, this may be possible with some inventive use of rules. 391# As an example, let's assume you want to introduce a simple "man:" scheme 392# for showing manual pages, so (for a Unix-like system, at least) "man:lynx" 393# would display the same help information as the "man lynx" command and so 394# on (we ignore section numbers etc. for simplicity here). 395# First, since lynx doesn't know anything about a "man:" scheme, it will 396# normally reject any such URLs at an early stage. However, a trick exists 397# to bypass that hurdle: define a man_proxy environment variable *outside of 398# lynx, before starting lynx* (it won't work in lynx.cfg), the actual value 399# is unimportant and won't actually be used. For example, in your shell: 400# export man_proxy=X 401# 402# If you already have some kind of HTTP-accessible man gateway available, 403# the task then probably just amounts to transforming the URL into the right 404# form. For one such gateway (in this case, a CGI script running on the 405# local machine), the rule 406# Redirect man:* http://localhost/cgi-bin/dwww?type=runman&location=*/ 407# or, alternatively, 408# UseProxy man:* none 409# Map man:* http://localhost/cgi-bin/dwww?type=runman&location=*/ 410# does it, for other setups the right-hand side just has to be modified 411# appropriately. The "UseProxy" is to make sure the bogus man_proxy gets 412# ignored. 413# 414# If no CGI-like access is available, you might want to invoke your system's 415# man command directly for a man: URL. Here is some discussion of how this 416# could be done, and why ultimately you may not want to do it; this is also 417# an opportunity to show examples for how some of the rules and conditions 418# can be used that haven't been discussed in detail elsewhere. 419# Lynx provides the lynxexec: (and the similar lynxprog:) scheme for running 420# (nearly) arbitrary commands locally. At the heart of employing it for 421# man: would be a rule like this: 422# Redirect man:* "lynxexec:/usr/bin/man *" 423# (It is a peculiarity of this scheme that the literal space and quoting 424# are necessary here. Also note that Map cannot be used here instead of 425# Redirect, since lynxexec, as a special kind of URL, needs to be handled 426# "early" in a request.) 427# Of course, execution of arbitrary commands is a potentially dangerous 428# thing. lynxexec has to be specifically enabled at compile time and in 429# lynx.cfg (or with command line options), and there are various levels 430# of control, too much to go into here. It is assumed in the following that 431# lynxexec has been enabled to the degree necessary (allow /usr/bin/man 432# execution) but hopefully not too much. 433# What needs to be prevented is that allowing local execution of the man 434# command might unintentionally open up unwanted execution of other commands, 435# possibly by some trick that could be exploited. For example, redirecting 436# man:* as above, the URL "man:lynx;rm -r *" could result in the command 437# "man lynx;rm -r *" executed by the system, with obvious disastrous results. 438# (This particular example won't actually work, for several reasons; but 439# for the purpose of discussion let's assume it did, there may be similar 440# ones that do.) 441# Because of such dangers, redirection to a lynxexec: is normally never 442# accepted by lynx. We need at least a PermitRedirection rule to override 443# this protective limitation: 444# PermitRedirection man:* 445# Redirect man:* "lynxexec:/usr/bin/man *" 446# But now we have potentially opened up local execution more than is 447# acceptable via the man: scheme, so this needs to be examined. 448# There are two aspects to security here: (1) restricting the user, and (2) 449# protecting the user. The first could also be phrased as protecting the 450# system from the user; the second as preventing lynx (and the system) from 451# doing things the user doesn't really want. Aspect (1) is very important 452# for setups providing anonymous guest accounts and similarly restricted 453# environments. (Otherwise shell access is normally allowed, and trying to 454# protect the system in lynx would be rather pointless.) As far as access 455# to some URLs is concerned, the difference can be characterized in terms of 456# which sources of URLs are trusted enough to allow access: for (1), only 457# links occurring in a limited number of documents are trusted enough for 458# some (or all) URLs, user input at 'g'oto prompts and the like is not (if 459# not completely disabled). For (2) and assuming a user with normal shell 460# privileges, the user may be trusted enough to accept any URL explicitly 461# entered, but URLs from arbitrary external sources are not - someone might 462# try to use them to trick the user (by following an innocent-looking link) 463# or lynx (by following a redirection) into doing something undesirable. 464# 465# In the following we are concerned with (2); it is assumed that providers 466# of anonymous accounts would not want to follow this path, and would have 467# no need for additional schemes that imply local execution anyway. (For 468# one thing, with the man example they would have to carefully check that 469# users cannot break out of the man command to a local shell prompt.) 470# 471# Getting back to the example, it was already mentioned that lynx does not 472# allow redirections to lynxexec. In fact this continues to be disallowed 473# for real redirection received from HTTP servers. But we have introduced 474# a new man: scheme, and the lynx code that does the redirection checking 475# doesn't know anything about special considerations for man: URLs, so 476# an external HTTP server might send a redirection message with "Location: 477# man:<something>", which lynx would allow, and which would in turn be 478# redirected by our rule to "lynxexec:/usr/bin/man <something>". Unless 479# we are 100% sure that either this can never happen or that the lynxexec 480# URL resulting from this can have no harmful effect, this needs to be 481# prevented. It can be done by checking for the "redirected" condition, 482# either by putting something like (the first line is of course optional) 483# Alert man:* "Redirection to man: not allowed" if redirected 484# Fail man:* if redirected 485# somewhere before the Redirect rule, or, reversing the logic, by adding 486# a condition to the redirection rules, i.e. they become 487# PermitRedirection man:* unless redirected 488# Redirect man:* "lynxexec:/usr/bin/man *" unless redirected 489# (actually, putting the condition on either one of the rules would be 490# sufficient). The second variant assumes that the attempted access to 491# man: via redirection will ultimately fail because there is no other way 492# to handle such URLs. 493# 494# The above should take care of rejecting man: URLs from redirections, but 495# what about regular links in HTML (like <A HREF="man:...">)? As long as 496# it can be assumed that the user will always inspect each and every link 497# before following it, and never follow a link that can have harmful effect, 498# no further restrictions are necessary. But this is a very big assumption, 499# unrealistic except perhaps in some single-user setups where the user is 500# is identical with the rule writer. So normally most links have to be 501# regarded as suspect, and only URLs entered by the user can be accepted: 502# Alert man:* "Redirection to man: not allowed" if redirected 503# Fail man:* if redirected 504# Alert man:* "Link to man: not allowed" unless userspecified 505# Fail man:* unless userspecified 506# 507# With these restrictions we have limited the ways our new man: scheme can 508# be used rather severely, to the point where its usefulness is questionable. 509# In addition to 'g'oto prompts, it may work in Jump files; also, should 510# links to man:<something> appear in HTML text, the user could retype them 511# manually or use the ELGOTO ('E') command with some trivial editing (like 512# adding a space) to "confirm" the URL. Even if the precautions outlined 513# above are followed: THIS TEXT DOES NOT IMPLY ANY PROMISE THAT, BY FOLLOWING 514# THE EXAMPLES, LYNX WILL BE SAFE. On the other hand, some of the precautions 515# *may* not be necessary: it is possible that careful use of TRUSTED_EXEC 516# options in lynx.cfg could offer enough protection while making the new 517# scheme more useful. 518# 519# If all this seems a bit too scary, that's intentional; it should be noted 520# that these considerations are not in general necessary for "harmless" URL 521# schemes, but appropriate for this "extreme" example. One last remark 522# regarding the hypothetical man scheme: instead of implementing it through 523# "lynxexec:" or "lynxprog:", it would be somewhat safer to use "lynxcgi:" 524# instead if it is supported. A simple lynxcgi script would have to write 525# the man page to stdout (either converted to text/html or as plain text, 526# preceded by an appropriate Content-Type header line), and all necessary 527# checking for special shell characters would be done within the script - 528# lynx does not use the system() function to run the script. 529# 530# Other Limitations 531# ================= 532# First, see CAVEAT above. There are other limitations: 533# 534# 1. Applicable URL schemes 535# ------------------------- 536# Rules processing does not apply to all URL schemes. Some are 537# handled differently from the generic access code, therefore rules 538# for such URLs will never be "seen". This limitation applies at 539# least to lynxexec:, lynxprog:, mailto:, LYNXHIST:, LYNXMESSAGES:, 540# LYNXCFG:, and LYNXCOMPILEOPTS: URLs. You shouldn't be tempted 541# to try to redirect most of these schemes anyway, but this also 542# makes it impossible to disable them with "Fail" rules. 543# 544# Also, a scheme has to be known to Lynx in order to get as far as 545# applying rules - you cannot just define your own new foobar: scheme 546# and then map it to something here, but see Application 9, above, 547# for a workaround. 548# 549# 2. No re-checking 550# ----------------- 551# When a URL is mapped to a different one, the new URL is not checked 552# again for compliance with most restrictions established by -anonymous, 553# -restrictions, lynx.cfg and so on. This can be regarded as a feature: 554# it allows specific exceptions. Of course it means that users for 555# whom any restrictions must be enforced cannot have write access to a 556# personal rules file, but that should be obvious anyway! 557# This limitation does not applies if "Redirect" is used, in that case 558# the new URL will always be re-examined. 559# 560# 3. Mappings are invisible 561# ------------------------- 562# Changing the URL with "Map" or "Pass" rules will in general not be 563# visible to the user, because it happens at a late stage of processing 564# a request (similar to directing a request through a proxy). One 565# can think of two kinds of URL for every resource: a "Document URL" as 566# the user sees it (on INFO page, history list, status line, etc.), and 567# a "physical URL" used for the actual access. Rules change only the 568# physical URL. This is different from the effect of HTTP redirection. 569# Often this is bad, sometimes it may be desirable. 570# 571# Changing the URL can create broken links if a document has relative URLs, 572# since they are taken to be relative to the "Document URL" (if no BASE tag 573# is present) when the HTML is parsed. 574# 575# This limitation does not apply if "Redirect" is used - the new location 576# will be visible to the user, and will be used by lynx for resolving 577# relative URLs within the document. 578# 579# 4. Interaction with proxying 580# ---------------------------- 581# Rules processing is done after most other access checks, but before 582# proxy (and gateway) settings are examined. A "Fail" rule works 583# as expected, but when the URL has been mapped to a different one, 584# the subsequent proxy checking can get confused. If it decides that 585# access is through a proxy or gateway, it will generally use the 586# original URL to construct the "physical" URL, effectively overriding 587# the mapping rules. If the mapping is to a different access scheme 588# or hostname, proxy checking could also be fooled to use a proxy when 589# it shouldn't, to not use one when it should, or (if different proxies 590# are used for different schemes) to use the wrong proxy. So "just 591# don't do that"; in some cases setting the no_proxy variable will help. 592# Example 3 happens to work nicely if there is a http_proxy but no 593# ftp_proxy. 594# 595# This limitation does not come into play if a "UseProxy" rule is applied, 596# in either of its two forms: with a PROXYURL, proxying is fully under 597# the control of the rules author, and with "none", subsequent proxy 598# and gateway checking is completely disabled. It is therefore a good 599# idea to combine any "Map" and "Pass" rules that might result in passing 600# the changed URL with explicit "UseProxy" rules, if the rules file is 601# expected to be used together with proxying; or else always use "Redirect" 602# instead of simple passing. 603# 604# 5. Case-sensitive matching 605# -------------------------- 606# The matching logic is generic string-based. It doesn't know anything 607# about URL syntax, and so it cannot know in which parts of a URL case 608# matters and where it doesn't. As a result, all comparisons are case- 609# sensitive. If (a limited number of) case variations of a URL need 610# to be dealt with, several rules can be used instead of one. 611# In particular, this makes "UseProxy ... none" in some ways more limited 612# than a no_proxy setting. 613# 614# 6. Redirection differences 615# -------------------------- 616# For some URLs lynx does never check after a request whether a redirection 617# occurs; that makes the "Redirect" rule useless for such URLs (in addition 618# to those mentioned under limitation 1.). Some of them are some gopher 619# types, telnet: and similar in most situations, newspost: and similar, 620# lynxcgi:, and some other private types. Trying to redirect these will 621# make access fail. You probable don't want to change such URLs anyway, 622# but if you feel you must, try using "Map" and "Pass" instead. 623# 624# The -noredir command line option only applies for real HTTP redirection 625# responses, Redirect rules are still applied. Also for certain other 626# command line options (-mime_header, -head) and command keys (HEAD) lynx 627# shows the redirection message (or part of it) in case of a real HTTP 628# redirection, instead of following the redirection. Here, too, a Redirect 629# rule remains effective (there is no redirection message to show, after all). 630# 631# 7. URLs required 632# ---------------- 633# Full absolute URLs (modulo possible "*" matching wildcards) are required 634# in rules. Strings like "www.somewhere.com" or "/some/dir/some.file" or 635# "www.somewhere.com/some/dir/some.file" are not URLs. Lynx may accept 636# them as user input, as abbreviated forms for URLs; but by the time the 637# rules get checked, those have been converted to full URLs, if they can 638# be recognized. This also means that rules cannot influence which strings 639# typed at a 'g'oto prompt are recognized for URLs - rules processing kicks 640# in later. 641