1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
4<html xmlns="http://www.w3.org/1999/xhtml">
5  <head>
6    <meta name="generator" content="HTML Tidy, see www.w3.org" />
7
8    <title>Apache API notes</title>
9  </head>
10  <!-- Background white, links blue (unvisited), navy (visited), red (active) -->
11
12  <body bgcolor="#FFFFFF" text="#000000" link="#0000FF"
13  vlink="#000080" alink="#FF0000">
14        <div align="CENTER">
15      <img src="../images/sub.gif" alt="[APACHE DOCUMENTATION]" />
16
17      <h3>Apache HTTP Server Version 1.3</h3>
18    </div>
19
20
21    <h1 align="CENTER">Apache API notes</h1>
22    These are some notes on the Apache API and the data structures
23    you have to deal with, <em>etc.</em> They are not yet nearly
24    complete, but hopefully, they will help you get your bearings.
25    Keep in mind that the API is still subject to change as we gain
26    experience with it. (See the TODO file for what <em>might</em>
27    be coming). However, it will be easy to adapt modules to any
28    changes that are made. (We have more modules to adapt than you
29    do).
30
31    <p>A few notes on general pedagogical style here. In the
32    interest of conciseness, all structure declarations here are
33    incomplete --- the real ones have more slots that I'm not
34    telling you about. For the most part, these are reserved to one
35    component of the server core or another, and should be altered
36    by modules with caution. However, in some cases, they really
37    are things I just haven't gotten around to yet. Welcome to the
38    bleeding edge.</p>
39
40    <p>Finally, here's an outline, to give you some bare idea of
41    what's coming up, and in what order:</p>
42
43    <ul>
44      <li>
45        <a href="#basics">Basic concepts.</a>
46
47        <ul>
48          <li><a href="#HMR">Handlers, Modules, and
49          Requests</a></li>
50
51          <li><a href="#moduletour">A brief tour of a
52          module</a></li>
53        </ul>
54      </li>
55
56      <li>
57        <a href="#handlers">How handlers work</a>
58
59        <ul>
60          <li><a href="#req_tour">A brief tour of the
61          <code>request_rec</code></a></li>
62
63          <li><a href="#req_orig">Where request_rec structures come
64          from</a></li>
65
66          <li><a href="#req_return">Handling requests, declining,
67          and returning error codes</a></li>
68
69          <li><a href="#resp_handlers">Special considerations for
70          response handlers</a></li>
71
72          <li><a href="#auth_handlers">Special considerations for
73          authentication handlers</a></li>
74
75          <li><a href="#log_handlers">Special considerations for
76          logging handlers</a></li>
77        </ul>
78      </li>
79
80      <li><a href="#pools">Resource allocation and resource
81      pools</a></li>
82
83      <li>
84        <a href="#config">Configuration, commands and the like</a>
85
86        <ul>
87          <li><a href="#per-dir">Per-directory configuration
88          structures</a></li>
89
90          <li><a href="#commands">Command handling</a></li>
91
92          <li><a href="#servconf">Side notes --- per-server
93          configuration, virtual servers, <em>etc</em>.</a></li>
94        </ul>
95      </li>
96    </ul>
97
98    <h2><a id="basics" name="basics">Basic concepts.</a></h2>
99    We begin with an overview of the basic concepts behind the API,
100    and how they are manifested in the code.
101
102    <h3><a id="HMR" name="HMR">Handlers, Modules, and
103    Requests</a></h3>
104    Apache breaks down request handling into a series of steps,
105    more or less the same way the Netscape server API does
106    (although this API has a few more stages than NetSite does, as
107    hooks for stuff I thought might be useful in the future). These
108    are:
109
110    <ul>
111      <li>URI -&gt; Filename translation</li>
112
113      <li>Auth ID checking [is the user who they say they
114      are?]</li>
115
116      <li>Auth access checking [is the user authorized
117      <em>here</em>?]</li>
118
119      <li>Access checking other than auth</li>
120
121      <li>Determining MIME type of the object requested</li>
122
123      <li>`Fixups' --- there aren't any of these yet, but the phase
124      is intended as a hook for possible extensions like
125      <code>SetEnv</code>, which don't really fit well
126      elsewhere.</li>
127
128      <li>Actually sending a response back to the client.</li>
129
130      <li>Logging the request</li>
131    </ul>
132    These phases are handled by looking at each of a succession of
133    <em>modules</em>, looking to see if each of them has a handler
134    for the phase, and attempting invoking it if so. The handler
135    can typically do one of three things:
136
137    <ul>
138      <li><em>Handle</em> the request, and indicate that it has
139      done so by returning the magic constant <code>OK</code>.</li>
140
141      <li><em>Decline</em> to handle the request, by returning the
142      magic integer constant <code>DECLINED</code>. In this case,
143      the server behaves in all respects as if the handler simply
144      hadn't been there.</li>
145
146      <li>Signal an error, by returning one of the HTTP error
147      codes. This terminates normal handling of the request,
148      although an ErrorDocument may be invoked to try to mop up,
149      and it will be logged in any case.</li>
150    </ul>
151    Most phases are terminated by the first module that handles
152    them; however, for logging, `fixups', and non-access
153    authentication checking, all handlers always run (barring an
154    error). Also, the response phase is unique in that modules may
155    declare multiple handlers for it, via a dispatch table keyed on
156    the MIME type of the requested object. Modules may declare a
157    response-phase handler which can handle <em>any</em> request,
158    by giving it the key <code>*/*</code> (<em>i.e.</em>, a
159    wildcard MIME type specification). However, wildcard handlers
160    are only invoked if the server has already tried and failed to
161    find a more specific response handler for the MIME type of the
162    requested object (either none existed, or they all declined).
163
164    <p>The handlers themselves are functions of one argument (a
165    <code>request_rec</code> structure. vide infra), which returns
166    an integer, as above.</p>
167
168    <h3><a id="moduletour" name="moduletour">A brief tour of a
169    module</a></h3>
170    At this point, we need to explain the structure of a module.
171    Our candidate will be one of the messier ones, the CGI module
172    --- this handles both CGI scripts and the
173    <code>ScriptAlias</code> config file command. It's actually a
174    great deal more complicated than most modules, but if we're
175    going to have only one example, it might as well be the one
176    with its fingers in every place.
177
178    <p>Let's begin with handlers. In order to handle the CGI
179    scripts, the module declares a response handler for them.
180    Because of <code>ScriptAlias</code>, it also has handlers for
181    the name translation phase (to recognize
182    <code>ScriptAlias</code>ed URIs), the type-checking phase (any
183    <code>ScriptAlias</code>ed request is typed as a CGI
184    script).</p>
185
186    <p>The module needs to maintain some per (virtual) server
187    information, namely, the <code>ScriptAlias</code>es in effect;
188    the module structure therefore contains pointers to a functions
189    which builds these structures, and to another which combines
190    two of them (in case the main server and a virtual server both
191    have <code>ScriptAlias</code>es declared).</p>
192
193    <p>Finally, this module contains code to handle the
194    <code>ScriptAlias</code> command itself. This particular module
195    only declares one command, but there could be more, so modules
196    have <em>command tables</em> which declare their commands, and
197    describe where they are permitted, and how they are to be
198    invoked.</p>
199
200    <p>A final note on the declared types of the arguments of some
201    of these commands: a <code>pool</code> is a pointer to a
202    <em>resource pool</em> structure; these are used by the server
203    to keep track of the memory which has been allocated, files
204    opened, <em>etc.</em>, either to service a particular request,
205    or to handle the process of configuring itself. That way, when
206    the request is over (or, for the configuration pool, when the
207    server is restarting), the memory can be freed, and the files
208    closed, <em>en masse</em>, without anyone having to write
209    explicit code to track them all down and dispose of them. Also,
210    a <code>cmd_parms</code> structure contains various information
211    about the config file being read, and other status information,
212    which is sometimes of use to the function which processes a
213    config-file command (such as <code>ScriptAlias</code>). With no
214    further ado, the module itself:</p>
215<pre>
216/* Declarations of handlers. */
217
218int translate_scriptalias (request_rec *);
219int type_scriptalias (request_rec *);
220int cgi_handler (request_rec *);
221
222/* Subsidiary dispatch table for response-phase handlers, by MIME type */
223
224handler_rec cgi_handlers[] = {
225{ "application/x-httpd-cgi", cgi_handler },
226{ NULL }
227};
228
229/* Declarations of routines to manipulate the module's configuration
230 * info.  Note that these are returned, and passed in, as void *'s;
231 * the server core keeps track of them, but it doesn't, and can't,
232 * know their internal structure.
233 */
234
235void *make_cgi_server_config (pool *);
236void *merge_cgi_server_config (pool *, void *, void *);
237
238/* Declarations of routines to handle config-file commands */
239
240extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake,
241                          char *real);
242
243command_rec cgi_cmds[] = {
244{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2,
245    "a fakename and a realname"},
246{ NULL }
247};
248
249module cgi_module = {
250   STANDARD_MODULE_STUFF,
251   NULL,                     /* initializer */
252   NULL,                     /* dir config creator */
253   NULL,                     /* dir merger --- default is to override */
254   make_cgi_server_config,   /* server config */
255   merge_cgi_server_config,  /* merge server config */
256   cgi_cmds,                 /* command table */
257   cgi_handlers,             /* handlers */
258   translate_scriptalias,    /* filename translation */
259   NULL,                     /* check_user_id */
260   NULL,                     /* check auth */
261   NULL,                     /* check access */
262   type_scriptalias,         /* type_checker */
263   NULL,                     /* fixups */
264   NULL,                     /* logger */
265   NULL                      /* header parser */
266};
267</pre>
268
269    <h2><a id="handlers" name="handlers">How handlers work</a></h2>
270    The sole argument to handlers is a <code>request_rec</code>
271    structure. This structure describes a particular request which
272    has been made to the server, on behalf of a client. In most
273    cases, each connection to the client generates only one
274    <code>request_rec</code> structure.
275
276    <h3><a id="req_tour" name="req_tour">A brief tour of the
277    <code>request_rec</code></a></h3>
278    The <code>request_rec</code> contains pointers to a resource
279    pool which will be cleared when the server is finished handling
280    the request; to structures containing per-server and
281    per-connection information, and most importantly, information
282    on the request itself.
283
284    <p>The most important such information is a small set of
285    character strings describing attributes of the object being
286    requested, including its URI, filename, content-type and
287    content-encoding (these being filled in by the translation and
288    type-check handlers which handle the request,
289    respectively).</p>
290
291    <p>Other commonly used data items are tables giving the MIME
292    headers on the client's original request, MIME headers to be
293    sent back with the response (which modules can add to at will),
294    and environment variables for any subprocesses which are
295    spawned off in the course of servicing the request. These
296    tables are manipulated using the <code>ap_table_get</code> and
297    <code>ap_table_set</code> routines.</p>
298
299    <blockquote>
300      Note that the <samp>Content-type</samp> header value
301      <em>cannot</em> be set by module content-handlers using the
302      <samp>ap_table_*()</samp> routines. Rather, it is set by
303      pointing the <samp>content_type</samp> field in the
304      <samp>request_rec</samp> structure to an appropriate string.
305      <em>E.g.</em>,
306<pre>
307  r-&gt;content_type = "text/html";
308</pre>
309    </blockquote>
310    Finally, there are pointers to two data structures which, in
311    turn, point to per-module configuration structures.
312    Specifically, these hold pointers to the data structures which
313    the module has built to describe the way it has been configured
314    to operate in a given directory (via <code>.htaccess</code>
315    files or <code>&lt;Directory&gt;</code> sections), for private
316    data it has built in the course of servicing the request (so
317    modules' handlers for one phase can pass `notes' to their
318    handlers for other phases). There is another such configuration
319    vector in the <code>server_rec</code> data structure pointed to
320    by the <code>request_rec</code>, which contains per (virtual)
321    server configuration data.
322
323    <p>Here is an abridged declaration, giving the fields most
324    commonly used:</p>
325<pre>
326struct request_rec {
327
328  pool *pool;
329  conn_rec *connection;
330  server_rec *server;
331
332  /* What object is being requested */
333
334  char *uri;
335  char *filename;
336  char *path_info;
337  char *args;           /* QUERY_ARGS, if any */
338  struct stat finfo;    /* Set by server core;
339                         * st_mode set to zero if no such file */
340
341  char *content_type;
342  char *content_encoding;
343
344  /* MIME header environments, in and out.  Also, an array containing
345   * environment variables to be passed to subprocesses, so people can
346   * write modules to add to that environment.
347   *
348   * The difference between headers_out and err_headers_out is that
349   * the latter are printed even on error, and persist across internal
350   * redirects (so the headers printed for ErrorDocument handlers will
351   * have them).
352   */
353
354  table *headers_in;
355  table *headers_out;
356  table *err_headers_out;
357  table *subprocess_env;
358
359  /* Info about the request itself... */
360
361  int header_only;     /* HEAD request, as opposed to GET */
362  char *protocol;      /* Protocol, as given to us, or HTTP/0.9 */
363  char *method;        /* GET, HEAD, POST, <em>etc.</em> */
364  int method_number;   /* M_GET, M_POST, <em>etc.</em> */
365
366  /* Info for logging */
367
368  char *the_request;
369  int bytes_sent;
370
371  /* A flag which modules can set, to indicate that the data being
372   * returned is volatile, and clients should be told not to cache it.
373   */
374
375  int no_cache;
376
377  /* Various other config info which may change with .htaccess files
378   * These are config vectors, with one void* pointer for each module
379   * (the thing pointed to being the module's business).
380   */
381
382  void *per_dir_config;   /* Options set in config files, <em>etc.</em> */
383  void *request_config;   /* Notes on *this* request */
384
385};
386
387</pre>
388
389    <h3><a id="req_orig" name="req_orig">Where request_rec
390    structures come from</a></h3>
391    Most <code>request_rec</code> structures are built by reading
392    an HTTP request from a client, and filling in the fields.
393    However, there are a few exceptions:
394
395    <ul>
396      <li>If the request is to an imagemap, a type map
397      (<em>i.e.</em>, a <code>*.var</code> file), or a CGI script
398      which returned a local `Location:', then the resource which
399      the user requested is going to be ultimately located by some
400      URI other than what the client originally supplied. In this
401      case, the server does an <em>internal redirect</em>,
402      constructing a new <code>request_rec</code> for the new URI,
403      and processing it almost exactly as if the client had
404      requested the new URI directly.</li>
405
406      <li>If some handler signaled an error, and an
407      <code>ErrorDocument</code> is in scope, the same internal
408      redirect machinery comes into play.</li>
409
410      <li>
411        Finally, a handler occasionally needs to investigate `what
412        would happen if' some other request were run. For instance,
413        the directory indexing module needs to know what MIME type
414        would be assigned to a request for each directory entry, in
415        order to figure out what icon to use.
416
417        <p>Such handlers can construct a <em>sub-request</em>,
418        using the functions <code>ap_sub_req_lookup_file</code>,
419        <code>ap_sub_req_lookup_uri</code>, and
420        <code>ap_sub_req_method_uri</code>; these construct a new
421        <code>request_rec</code> structure and processes it as you
422        would expect, up to but not including the point of actually
423        sending a response. (These functions skip over the access
424        checks if the sub-request is for a file in the same
425        directory as the original request).</p>
426
427        <p>(Server-side includes work by building sub-requests and
428        then actually invoking the response handler for them, via
429        the function <code>ap_run_sub_req</code>).</p>
430      </li>
431    </ul>
432
433    <h3><a id="req_return" name="req_return">Handling requests,
434    declining, and returning error codes</a></h3>
435    As discussed above, each handler, when invoked to handle a
436    particular <code>request_rec</code>, has to return an
437    <code>int</code> to indicate what happened. That can either be
438
439    <ul>
440      <li>OK --- the request was handled successfully. This may or
441      may not terminate the phase.</li>
442
443      <li>DECLINED --- no erroneous condition exists, but the
444      module declines to handle the phase; the server tries to find
445      another.</li>
446
447      <li>an HTTP error code, which aborts handling of the
448      request.</li>
449    </ul>
450    Note that if the error code returned is <code>REDIRECT</code>,
451    then the module should put a <code>Location</code> in the
452    request's <code>headers_out</code>, to indicate where the
453    client should be redirected <em>to</em>.
454
455    <h3><a id="resp_handlers" name="resp_handlers">Special
456    considerations for response handlers</a></h3>
457    Handlers for most phases do their work by simply setting a few
458    fields in the <code>request_rec</code> structure (or, in the
459    case of access checkers, simply by returning the correct error
460    code). However, response handlers have to actually send a
461    request back to the client.
462
463    <p>They should begin by sending an HTTP response header, using
464    the function <code>ap_send_http_header</code>. (You don't have
465    to do anything special to skip sending the header for HTTP/0.9
466    requests; the function figures out on its own that it shouldn't
467    do anything). If the request is marked
468    <code>header_only</code>, that's all they should do; they
469    should return after that, without attempting any further
470    output.</p>
471
472    <p>Otherwise, they should produce a request body which responds
473    to the client as appropriate. The primitives for this are
474    <code>ap_rputc</code> and <code>ap_rprintf</code>, for
475    internally generated output, and <code>ap_send_fd</code>, to
476    copy the contents of some <code>FILE *</code> straight to the
477    client.</p>
478
479    <p>At this point, you should more or less understand the
480    following piece of code, which is the handler which handles
481    <code>GET</code> requests which have no more specific handler;
482    it also shows how conditional <code>GET</code>s can be handled,
483    if it's desirable to do so in a particular response handler ---
484    <code>ap_set_last_modified</code> checks against the
485    <code>If-modified-since</code> value supplied by the client, if
486    any, and returns an appropriate code (which will, if nonzero,
487    be USE_LOCAL_COPY). No similar considerations apply for
488    <code>ap_set_content_length</code>, but it returns an error
489    code for symmetry.</p>
490<pre>
491int default_handler (request_rec *r)
492{
493    int errstatus;
494    FILE *f;
495
496    if (r-&gt;method_number != M_GET) return DECLINED;
497    if (r-&gt;finfo.st_mode == 0) return NOT_FOUND;
498
499    if ((errstatus = ap_set_content_length (r, r-&gt;finfo.st_size))) {
500        return errstatus;
501    }
502
503    r-&gt;mtime = r-&gt;finfo.st_mtime;
504    ap_set_last_modified (r);
505
506    f = ap_pfopen (r-&gt;pool, r-&gt;filename, "r");
507
508    if (f == NULL) {
509        ap_log_rerror(APLOG_MARK, APLOG_ERR, r,
510             "file permissions deny server access: %s", r-&gt;filename);
511        return FORBIDDEN;
512    }
513
514    ap_soft_timeout ("send", r);
515    ap_send_http_header (r);
516
517    if (!r-&gt;header_only) ap_send_fd (f, r);
518    ap_pfclose (r-&gt;pool, f);
519
520    ap_kill_timeout (r);
521    return OK;
522}
523</pre>
524    Finally, if all of this is too much of a challenge, there are a
525    few ways out of it. First off, as shown above, a response
526    handler which has not yet produced any output can simply return
527    an error code, in which case the server will automatically
528    produce an error response. Secondly, it can punt to some other
529    handler by invoking <code>ap_internal_redirect</code>, which is
530    how the internal redirection machinery discussed above is
531    invoked. A response handler which has internally redirected
532    should always return <code>OK</code>.
533
534    <p>(Invoking <code>ap_internal_redirect</code> from handlers
535    which are <em>not</em> response handlers will lead to serious
536    confusion).</p>
537
538    <h3><a id="auth_handlers" name="auth_handlers">Special
539    considerations for authentication handlers</a></h3>
540    Stuff that should be discussed here in detail:
541
542    <ul>
543      <li>Authentication-phase handlers not invoked unless auth is
544      configured for the directory.</li>
545
546      <li>Common auth configuration stored in the core per-dir
547      configuration; it has accessors <code>ap_auth_type</code>,
548      <code>ap_auth_name</code>, and <code>ap_requires</code>.</li>
549
550      <li>Common routines, to handle the protocol end of things, at
551      least for HTTP basic authentication
552      (<code>ap_get_basic_auth_pw</code>, which sets the
553      <code>connection-&gt;user</code> structure field
554      automatically, and <code>ap_note_basic_auth_failure</code>,
555      which arranges for the proper <code>WWW-Authenticate:</code>
556      header to be sent back).</li>
557    </ul>
558
559    <h3><a id="log_handlers" name="log_handlers">Special
560    considerations for logging handlers</a></h3>
561    When a request has internally redirected, there is the question
562    of what to log. Apache handles this by bundling the entire
563    chain of redirects into a list of <code>request_rec</code>
564    structures which are threaded through the
565    <code>r-&gt;prev</code> and <code>r-&gt;next</code> pointers.
566    The <code>request_rec</code> which is passed to the logging
567    handlers in such cases is the one which was originally built
568    for the initial request from the client; note that the
569    bytes_sent field will only be correct in the last request in
570    the chain (the one for which a response was actually sent).
571
572    <h2><a id="pools" name="pools">Resource allocation and resource
573    pools</a></h2>
574
575    <p>One of the problems of writing and designing a server-pool
576    server is that of preventing leakage, that is, allocating
577    resources (memory, open files, <em>etc.</em>), without
578    subsequently releasing them. The resource pool machinery is
579    designed to make it easy to prevent this from happening, by
580    allowing resource to be allocated in such a way that they are
581    <em>automatically</em> released when the server is done with
582    them.</p>
583
584    <p>The way this works is as follows: the memory which is
585    allocated, file opened, <em>etc.</em>, to deal with a
586    particular request are tied to a <em>resource pool</em> which
587    is allocated for the request. The pool is a data structure
588    which itself tracks the resources in question.</p>
589
590    <p>When the request has been processed, the pool is
591    <em>cleared</em>. At that point, all the memory associated with
592    it is released for reuse, all files associated with it are
593    closed, and any other clean-up functions which are associated
594    with the pool are run. When this is over, we can be confident
595    that all the resource tied to the pool have been released, and
596    that none of them have leaked.</p>
597
598    <p>Server restarts, and allocation of memory and resources for
599    per-server configuration, are handled in a similar way. There
600    is a <em>configuration pool</em>, which keeps track of
601    resources which were allocated while reading the server
602    configuration files, and handling the commands therein (for
603    instance, the memory that was allocated for per-server module
604    configuration, log files and other files that were opened, and
605    so forth). When the server restarts, and has to reread the
606    configuration files, the configuration pool is cleared, and so
607    the memory and file descriptors which were taken up by reading
608    them the last time are made available for reuse.</p>
609
610    <p>It should be noted that use of the pool machinery isn't
611    generally obligatory, except for situations like logging
612    handlers, where you really need to register cleanups to make
613    sure that the log file gets closed when the server restarts
614    (this is most easily done by using the function <code><a
615    href="#pool-files">ap_pfopen</a></code>, which also arranges
616    for the underlying file descriptor to be closed before any
617    child processes, such as for CGI scripts, are
618    <code>exec</code>ed), or in case you are using the timeout
619    machinery (which isn't yet even documented here). However,
620    there are two benefits to using it: resources allocated to a
621    pool never leak (even if you allocate a scratch string, and
622    just forget about it); also, for memory allocation,
623    <code>ap_palloc</code> is generally faster than
624    <code>malloc</code>.</p>
625
626    <p>We begin here by describing how memory is allocated to
627    pools, and then discuss how other resources are tracked by the
628    resource pool machinery.</p>
629
630    <h3>Allocation of memory in pools</h3>
631
632    <p>Memory is allocated to pools by calling the function
633    <code>ap_palloc</code>, which takes two arguments, one being a
634    pointer to a resource pool structure, and the other being the
635    amount of memory to allocate (in <code>char</code>s). Within
636    handlers for handling requests, the most common way of getting
637    a resource pool structure is by looking at the
638    <code>pool</code> slot of the relevant
639    <code>request_rec</code>; hence the repeated appearance of the
640    following idiom in module code:</p>
641<pre>
642int my_handler(request_rec *r)
643{
644    struct my_structure *foo;
645    ...
646
647    foo = (foo *)ap_palloc (r-&gt;pool, sizeof(my_structure));
648}
649</pre>
650
651    <p>Note that <em>there is no <code>ap_pfree</code></em> ---
652    <code>ap_palloc</code>ed memory is freed only when the
653    associated resource pool is cleared. This means that
654    <code>ap_palloc</code> does not have to do as much accounting
655    as <code>malloc()</code>; all it does in the typical case is to
656    round up the size, bump a pointer, and do a range check.</p>
657
658    <p>(It also raises the possibility that heavy use of
659    <code>ap_palloc</code> could cause a server process to grow
660    excessively large. There are two ways to deal with this, which
661    are dealt with below; briefly, you can use <code>malloc</code>,
662    and try to be sure that all of the memory gets explicitly
663    <code>free</code>d, or you can allocate a sub-pool of the main
664    pool, allocate your memory in the sub-pool, and clear it out
665    periodically. The latter technique is discussed in the section
666    on sub-pools below, and is used in the directory-indexing code,
667    in order to avoid excessive storage allocation when listing
668    directories with thousands of files).</p>
669
670    <h3>Allocating initialized memory</h3>
671
672    <p>There are functions which allocate initialized memory, and
673    are frequently useful. The function <code>ap_pcalloc</code> has
674    the same interface as <code>ap_palloc</code>, but clears out
675    the memory it allocates before it returns it. The function
676    <code>ap_pstrdup</code> takes a resource pool and a <code>char
677    *</code> as arguments, and allocates memory for a copy of the
678    string the pointer points to, returning a pointer to the copy.
679    Finally <code>ap_pstrcat</code> is a varargs-style function,
680    which takes a pointer to a resource pool, and at least two
681    <code>char *</code> arguments, the last of which must be
682    <code>NULL</code>. It allocates enough memory to fit copies of
683    each of the strings, as a unit; for instance:</p>
684<pre>
685     ap_pstrcat (r-&gt;pool, "foo", "/", "bar", NULL);
686</pre>
687
688    <p>returns a pointer to 8 bytes worth of memory, initialized to
689    <code>"foo/bar"</code>.</p>
690
691    <h3><a id="pools-used" name="pools-used">Commonly-used pools in
692    the Apache Web server</a></h3>
693
694    <p>A pool is really defined by its lifetime more than anything
695    else. There are some static pools in http_main which are passed
696    to various non-http_main functions as arguments at opportune
697    times. Here they are:</p>
698
699    <dl compact="compact">
700      <dt>permanent_pool</dt>
701
702      <dd>
703        <ul>
704          <li>never passed to anything else, this is the ancestor
705          of all pools</li>
706        </ul>
707      </dd>
708
709      <dt>pconf</dt>
710
711      <dd>
712        <ul>
713          <li>subpool of permanent_pool</li>
714
715          <li>created at the beginning of a config "cycle"; exists
716          until the server is terminated or restarts; passed to all
717          config-time routines, either via cmd-&gt;pool, or as the
718          "pool *p" argument on those which don't take pools</li>
719
720          <li>passed to the module init() functions</li>
721        </ul>
722      </dd>
723
724      <dt>ptemp</dt>
725
726      <dd>
727        <ul>
728          <li>sorry I lie, this pool isn't called this currently in
729          1.3, I renamed it this in my pthreads development. I'm
730          referring to the use of ptrans in the parent... contrast
731          this with the later definition of ptrans in the
732          child.</li>
733
734          <li>subpool of permanent_pool</li>
735
736          <li>created at the beginning of a config "cycle"; exists
737          until the end of config parsing; passed to config-time
738          routines <em>via</em> cmd-&gt;temp_pool. Somewhat of a
739          "bastard child" because it isn't available everywhere.
740          Used for temporary scratch space which may be needed by
741          some config routines but which is deleted at the end of
742          config.</li>
743        </ul>
744      </dd>
745
746      <dt>pchild</dt>
747
748      <dd>
749        <ul>
750          <li>subpool of permanent_pool</li>
751
752          <li>created when a child is spawned (or a thread is
753          created); lives until that child (thread) is
754          destroyed</li>
755
756          <li>passed to the module child_init functions</li>
757
758          <li>destruction happens right after the child_exit
759          functions are called... (which may explain why I think
760          child_exit is redundant and unneeded)</li>
761        </ul>
762      </dd>
763
764      <dt>ptrans</dt>
765
766      <dd>
767        <ul>
768          <li>should be a subpool of pchild, but currently is a
769          subpool of permanent_pool, see above</li>
770
771          <li>cleared by the child before going into the accept()
772          loop to receive a connection</li>
773
774          <li>used as connection-&gt;pool</li>
775        </ul>
776      </dd>
777
778      <dt>r-&gt;pool</dt>
779
780      <dd>
781        <ul>
782          <li>for the main request this is a subpool of
783          connection-&gt;pool; for subrequests it is a subpool of
784          the parent request's pool.</li>
785
786          <li>exists until the end of the request (<em>i.e.</em>,
787          ap_destroy_sub_req, or in child_main after
788          process_request has finished)</li>
789
790          <li>note that r itself is allocated from r-&gt;pool;
791          <em>i.e.</em>, r-&gt;pool is first created and then r is
792          the first thing palloc()d from it</li>
793        </ul>
794      </dd>
795    </dl>
796
797    <p>For almost everything folks do, r-&gt;pool is the pool to
798    use. But you can see how other lifetimes, such as pchild, are
799    useful to some modules... such as modules that need to open a
800    database connection once per child, and wish to clean it up
801    when the child dies.</p>
802
803    <p>You can also see how some bugs have manifested themself,
804    such as setting connection-&gt;user to a value from r-&gt;pool
805    -- in this case connection exists for the lifetime of ptrans,
806    which is longer than r-&gt;pool (especially if r-&gt;pool is a
807    subrequest!). So the correct thing to do is to allocate from
808    connection-&gt;pool.</p>
809
810    <p>And there was another interesting bug in
811    mod_include/mod_cgi. You'll see in those that they do this test
812    to decide if they should use r-&gt;pool or r-&gt;main-&gt;pool.
813    In this case the resource that they are registering for cleanup
814    is a child process. If it were registered in r-&gt;pool, then
815    the code would wait() for the child when the subrequest
816    finishes. With mod_include this could be any old #include, and
817    the delay can be up to 3 seconds... and happened quite
818    frequently. Instead the subprocess is registered in
819    r-&gt;main-&gt;pool which causes it to be cleaned up when the
820    entire request is done -- <em>i.e.</em>, after the output has
821    been sent to the client and logging has happened.</p>
822
823    <h3><a id="pool-files" name="pool-files">Tracking open files,
824    etc.</a></h3>
825
826    <p>As indicated above, resource pools are also used to track
827    other sorts of resources besides memory. The most common are
828    open files. The routine which is typically used for this is
829    <code>ap_pfopen</code>, which takes a resource pool and two
830    strings as arguments; the strings are the same as the typical
831    arguments to <code>fopen</code>, <em>e.g.</em>,</p>
832<pre>
833     ...
834     FILE *f = ap_pfopen (r-&gt;pool, r-&gt;filename, "r");
835
836     if (f == NULL) { ... } else { ... }
837</pre>
838
839    <p>There is also a <code>ap_popenf</code> routine, which
840    parallels the lower-level <code>open</code> system call. Both
841    of these routines arrange for the file to be closed when the
842    resource pool in question is cleared.</p>
843
844    <p>Unlike the case for memory, there <em>are</em> functions to
845    close files allocated with <code>ap_pfopen</code>, and
846    <code>ap_popenf</code>, namely <code>ap_pfclose</code> and
847    <code>ap_pclosef</code>. (This is because, on many systems, the
848    number of files which a single process can have open is quite
849    limited). It is important to use these functions to close files
850    allocated with <code>ap_pfopen</code> and
851    <code>ap_popenf</code>, since to do otherwise could cause fatal
852    errors on systems such as Linux, which react badly if the same
853    <code>FILE*</code> is closed more than once.</p>
854
855    <p>(Using the <code>close</code> functions is not mandatory,
856    since the file will eventually be closed regardless, but you
857    should consider it in cases where your module is opening, or
858    could open, a lot of files).</p>
859
860    <h3>Other sorts of resources --- cleanup functions</h3>
861
862    <blockquote>
863      More text goes here. Describe the the cleanup primitives in
864      terms of which the file stuff is implemented; also,
865      <code>spawn_process</code>.
866    </blockquote>
867
868    <p>Pool cleanups live until clear_pool() is called:
869    clear_pool(a) recursively calls destroy_pool() on all subpools
870    of a; then calls all the cleanups for a; then releases all the
871    memory for a. destroy_pool(a) calls clear_pool(a) and then
872    releases the pool structure itself. <em>i.e.</em>,
873    clear_pool(a) doesn't delete a, it just frees up all the
874    resources and you can start using it again immediately.</p>
875
876    <h3>Fine control --- creating and dealing with sub-pools, with
877    a note on sub-requests</h3>
878    On rare occasions, too-free use of <code>ap_palloc()</code> and
879    the associated primitives may result in undesirably profligate
880    resource allocation. You can deal with such a case by creating
881    a <em>sub-pool</em>, allocating within the sub-pool rather than
882    the main pool, and clearing or destroying the sub-pool, which
883    releases the resources which were associated with it. (This
884    really <em>is</em> a rare situation; the only case in which it
885    comes up in the standard module set is in case of listing
886    directories, and then only with <em>very</em> large
887    directories. Unnecessary use of the primitives discussed here
888    can hair up your code quite a bit, with very little gain).
889
890    <p>The primitive for creating a sub-pool is
891    <code>ap_make_sub_pool</code>, which takes another pool (the
892    parent pool) as an argument. When the main pool is cleared, the
893    sub-pool will be destroyed. The sub-pool may also be cleared or
894    destroyed at any time, by calling the functions
895    <code>ap_clear_pool</code> and <code>ap_destroy_pool</code>,
896    respectively. (The difference is that
897    <code>ap_clear_pool</code> frees resources associated with the
898    pool, while <code>ap_destroy_pool</code> also deallocates the
899    pool itself. In the former case, you can allocate new resources
900    within the pool, and clear it again, and so forth; in the
901    latter case, it is simply gone).</p>
902
903    <p>One final note --- sub-requests have their own resource
904    pools, which are sub-pools of the resource pool for the main
905    request. The polite way to reclaim the resources associated
906    with a sub request which you have allocated (using the
907    <code>ap_sub_req_...</code> functions) is
908    <code>ap_destroy_sub_req</code>, which frees the resource pool.
909    Before calling this function, be sure to copy anything that you
910    care about which might be allocated in the sub-request's
911    resource pool into someplace a little less volatile (for
912    instance, the filename in its <code>request_rec</code>
913    structure).</p>
914
915    <p>(Again, under most circumstances, you shouldn't feel obliged
916    to call this function; only 2K of memory or so are allocated
917    for a typical sub request, and it will be freed anyway when the
918    main request pool is cleared. It is only when you are
919    allocating many, many sub-requests for a single main request
920    that you should seriously consider the
921    <code>ap_destroy_...</code> functions).</p>
922
923    <h2><a id="config" name="config">Configuration, commands and
924    the like</a></h2>
925    One of the design goals for this server was to maintain
926    external compatibility with the NCSA 1.3 server --- that is, to
927    read the same configuration files, to process all the
928    directives therein correctly, and in general to be a drop-in
929    replacement for NCSA. On the other hand, another design goal
930    was to move as much of the server's functionality into modules
931    which have as little as possible to do with the monolithic
932    server core. The only way to reconcile these goals is to move
933    the handling of most commands from the central server into the
934    modules.
935
936    <p>However, just giving the modules command tables is not
937    enough to divorce them completely from the server core. The
938    server has to remember the commands in order to act on them
939    later. That involves maintaining data which is private to the
940    modules, and which can be either per-server, or per-directory.
941    Most things are per-directory, including in particular access
942    control and authorization information, but also information on
943    how to determine file types from suffixes, which can be
944    modified by <code>AddType</code> and <code>DefaultType</code>
945    directives, and so forth. In general, the governing philosophy
946    is that anything which <em>can</em> be made configurable by
947    directory should be; per-server information is generally used
948    in the standard set of modules for information like
949    <code>Alias</code>es and <code>Redirect</code>s which come into
950    play before the request is tied to a particular place in the
951    underlying file system.</p>
952
953    <p>Another requirement for emulating the NCSA server is being
954    able to handle the per-directory configuration files, generally
955    called <code>.htaccess</code> files, though even in the NCSA
956    server they can contain directives which have nothing at all to
957    do with access control. Accordingly, after URI -&gt; filename
958    translation, but before performing any other phase, the server
959    walks down the directory hierarchy of the underlying
960    filesystem, following the translated pathname, to read any
961    <code>.htaccess</code> files which might be present. The
962    information which is read in then has to be <em>merged</em>
963    with the applicable information from the server's own config
964    files (either from the <code>&lt;Directory&gt;</code> sections
965    in <code>access.conf</code>, or from defaults in
966    <code>srm.conf</code>, which actually behaves for most purposes
967    almost exactly like <code>&lt;Directory /&gt;</code>).</p>
968
969    <p>Finally, after having served a request which involved
970    reading <code>.htaccess</code> files, we need to discard the
971    storage allocated for handling them. That is solved the same
972    way it is solved wherever else similar problems come up, by
973    tying those structures to the per-transaction resource
974    pool.</p>
975
976    <h3><a id="per-dir" name="per-dir">Per-directory configuration
977    structures</a></h3>
978    Let's look out how all of this plays out in
979    <code>mod_mime.c</code>, which defines the file typing handler
980    which emulates the NCSA server's behavior of determining file
981    types from suffixes. What we'll be looking at, here, is the
982    code which implements the <code>AddType</code> and
983    <code>AddEncoding</code> commands. These commands can appear in
984    <code>.htaccess</code> files, so they must be handled in the
985    module's private per-directory data, which in fact, consists of
986    two separate <code>table</code>s for MIME types and encoding
987    information, and is declared as follows:
988<pre>
989typedef struct {
990    table *forced_types;      /* Additional AddTyped stuff */
991    table *encoding_types;    /* Added with AddEncoding... */
992} mime_dir_config;
993</pre>
994    When the server is reading a configuration file, or
995    <code>&lt;Directory&gt;</code> section, which includes one of
996    the MIME module's commands, it needs to create a
997    <code>mime_dir_config</code> structure, so those commands have
998    something to act on. It does this by invoking the function it
999    finds in the module's `create per-dir config slot', with two
1000    arguments: the name of the directory to which this
1001    configuration information applies (or <code>NULL</code> for
1002    <code>srm.conf</code>), and a pointer to a resource pool in
1003    which the allocation should happen.
1004
1005    <p>(If we are reading a <code>.htaccess</code> file, that
1006    resource pool is the per-request resource pool for the request;
1007    otherwise it is a resource pool which is used for configuration
1008    data, and cleared on restarts. Either way, it is important for
1009    the structure being created to vanish when the pool is cleared,
1010    by registering a cleanup on the pool if necessary).</p>
1011
1012    <p>For the MIME module, the per-dir config creation function
1013    just <code>ap_palloc</code>s the structure above, and a creates
1014    a couple of <code>table</code>s to fill it. That looks like
1015    this:</p>
1016<pre>
1017void *create_mime_dir_config (pool *p, char *dummy)
1018{
1019    mime_dir_config *new =
1020      (mime_dir_config *) ap_palloc (p, sizeof(mime_dir_config));
1021
1022    new-&gt;forced_types = ap_make_table (p, 4);
1023    new-&gt;encoding_types = ap_make_table (p, 4);
1024
1025    return new;
1026}
1027</pre>
1028    Now, suppose we've just read in a <code>.htaccess</code> file.
1029    We already have the per-directory configuration structure for
1030    the next directory up in the hierarchy. If the
1031    <code>.htaccess</code> file we just read in didn't have any
1032    <code>AddType</code> or <code>AddEncoding</code> commands, its
1033    per-directory config structure for the MIME module is still
1034    valid, and we can just use it. Otherwise, we need to merge the
1035    two structures somehow.
1036
1037    <p>To do that, the server invokes the module's per-directory
1038    config merge function, if one is present. That function takes
1039    three arguments: the two structures being merged, and a
1040    resource pool in which to allocate the result. For the MIME
1041    module, all that needs to be done is overlay the tables from
1042    the new per-directory config structure with those from the
1043    parent:</p>
1044<pre>
1045void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv)
1046{
1047    mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv;
1048    mime_dir_config *subdir = (mime_dir_config *)subdirv;
1049    mime_dir_config *new =
1050      (mime_dir_config *)ap_palloc (p, sizeof(mime_dir_config));
1051
1052    new-&gt;forced_types = ap_overlay_tables (p, subdir-&gt;forced_types,
1053                                        parent_dir-&gt;forced_types);
1054    new-&gt;encoding_types = ap_overlay_tables (p, subdir-&gt;encoding_types,
1055                                          parent_dir-&gt;encoding_types);
1056
1057    return new;
1058}
1059</pre>
1060    As a note --- if there is no per-directory merge function
1061    present, the server will just use the subdirectory's
1062    configuration info, and ignore the parent's. For some modules,
1063    that works just fine (<em>e.g.</em>, for the includes module,
1064    whose per-directory configuration information consists solely
1065    of the state of the <code>XBITHACK</code>), and for those
1066    modules, you can just not declare one, and leave the
1067    corresponding structure slot in the module itself
1068    <code>NULL</code>.
1069
1070    <h3><a id="commands" name="commands">Command handling</a></h3>
1071    Now that we have these structures, we need to be able to figure
1072    out how to fill them. That involves processing the actual
1073    <code>AddType</code> and <code>AddEncoding</code> commands. To
1074    find commands, the server looks in the module's <code>command
1075    table</code>. That table contains information on how many
1076    arguments the commands take, and in what formats, where it is
1077    permitted, and so forth. That information is sufficient to
1078    allow the server to invoke most command-handling functions with
1079    pre-parsed arguments. Without further ado, let's look at the
1080    <code>AddType</code> command handler, which looks like this
1081    (the <code>AddEncoding</code> command looks basically the same,
1082    and won't be shown here):
1083<pre>
1084char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext)
1085{
1086    if (*ext == '.') ++ext;
1087    ap_table_set (m-&gt;forced_types, ext, ct);
1088    return NULL;
1089}
1090</pre>
1091    This command handler is unusually simple. As you can see, it
1092    takes four arguments, two of which are pre-parsed arguments,
1093    the third being the per-directory configuration structure for
1094    the module in question, and the fourth being a pointer to a
1095    <code>cmd_parms</code> structure. That structure contains a
1096    bunch of arguments which are frequently of use to some, but not
1097    all, commands, including a resource pool (from which memory can
1098    be allocated, and to which cleanups should be tied), and the
1099    (virtual) server being configured, from which the module's
1100    per-server configuration data can be obtained if required.
1101
1102    <p>Another way in which this particular command handler is
1103    unusually simple is that there are no error conditions which it
1104    can encounter. If there were, it could return an error message
1105    instead of <code>NULL</code>; this causes an error to be
1106    printed out on the server's <code>stderr</code>, followed by a
1107    quick exit, if it is in the main config files; for a
1108    <code>.htaccess</code> file, the syntax error is logged in the
1109    server error log (along with an indication of where it came
1110    from), and the request is bounced with a server error response
1111    (HTTP error status, code 500).</p>
1112
1113    <p>The MIME module's command table has entries for these
1114    commands, which look like this:</p>
1115<pre>
1116command_rec mime_cmds[] = {
1117{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2,
1118    "a mime type followed by a file extension" },
1119{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2,
1120    "an encoding (<em>e.g.</em>, gzip), followed by a file extension" },
1121{ NULL }
1122};
1123</pre>
1124    The entries in these tables are:
1125
1126    <ul>
1127      <li>The name of the command</li>
1128
1129      <li>The function which handles it</li>
1130
1131      <li>a <code>(void *)</code> pointer, which is passed in the
1132      <code>cmd_parms</code> structure to the command handler ---
1133      this is useful in case many similar commands are handled by
1134      the same function.</li>
1135
1136      <li>A bit mask indicating where the command may appear. There
1137      are mask bits corresponding to each
1138      <code>AllowOverride</code> option, and an additional mask
1139      bit, <code>RSRC_CONF</code>, indicating that the command may
1140      appear in the server's own config files, but <em>not</em> in
1141      any <code>.htaccess</code> file.</li>
1142
1143      <li>A flag indicating how many arguments the command handler
1144      wants pre-parsed, and how they should be passed in.
1145      <code>TAKE2</code> indicates two pre-parsed arguments. Other
1146      options are <code>TAKE1</code>, which indicates one
1147      pre-parsed argument, <code>FLAG</code>, which indicates that
1148      the argument should be <code>On</code> or <code>Off</code>,
1149      and is passed in as a boolean flag, <code>RAW_ARGS</code>,
1150      which causes the server to give the command the raw, unparsed
1151      arguments (everything but the command name itself). There is
1152      also <code>ITERATE</code>, which means that the handler looks
1153      the same as <code>TAKE1</code>, but that if multiple
1154      arguments are present, it should be called multiple times,
1155      and finally <code>ITERATE2</code>, which indicates that the
1156      command handler looks like a <code>TAKE2</code>, but if more
1157      arguments are present, then it should be called multiple
1158      times, holding the first argument constant.</li>
1159
1160      <li>Finally, we have a string which describes the arguments
1161      that should be present. If the arguments in the actual config
1162      file are not as required, this string will be used to help
1163      give a more specific error message. (You can safely leave
1164      this <code>NULL</code>).</li>
1165    </ul>
1166    Finally, having set this all up, we have to use it. This is
1167    ultimately done in the module's handlers, specifically for its
1168    file-typing handler, which looks more or less like this; note
1169    that the per-directory configuration structure is extracted
1170    from the <code>request_rec</code>'s per-directory configuration
1171    vector by using the <code>ap_get_module_config</code> function.
1172
1173<pre>
1174int find_ct(request_rec *r)
1175{
1176    int i;
1177    char *fn = ap_pstrdup (r-&gt;pool, r-&gt;filename);
1178    mime_dir_config *conf = (mime_dir_config *)
1179             ap_get_module_config(r-&gt;per_dir_config, &amp;mime_module);
1180    char *type;
1181
1182    if (S_ISDIR(r-&gt;finfo.st_mode)) {
1183        r-&gt;content_type = DIR_MAGIC_TYPE;
1184        return OK;
1185    }
1186
1187    if((i=ap_rind(fn,'.')) &lt; 0) return DECLINED;
1188    ++i;
1189
1190    if ((type = ap_table_get (conf-&gt;encoding_types, &amp;fn[i])))
1191    {
1192        r-&gt;content_encoding = type;
1193
1194        /* go back to previous extension to try to use it as a type */
1195
1196        fn[i-1] = '\0';
1197        if((i=ap_rind(fn,'.')) &lt; 0) return OK;
1198        ++i;
1199    }
1200
1201    if ((type = ap_table_get (conf-&gt;forced_types, &amp;fn[i])))
1202    {
1203        r-&gt;content_type = type;
1204    }
1205
1206    return OK;
1207}
1208
1209</pre>
1210
1211    <h3><a id="servconf" name="servconf">Side notes --- per-server
1212    configuration, virtual servers, <em>etc</em>.</a></h3>
1213    The basic ideas behind per-server module configuration are
1214    basically the same as those for per-directory configuration;
1215    there is a creation function and a merge function, the latter
1216    being invoked where a virtual server has partially overridden
1217    the base server configuration, and a combined structure must be
1218    computed. (As with per-directory configuration, the default if
1219    no merge function is specified, and a module is configured in
1220    some virtual server, is that the base configuration is simply
1221    ignored).
1222
1223    <p>The only substantial difference is that when a command needs
1224    to configure the per-server private module data, it needs to go
1225    to the <code>cmd_parms</code> data to get at it. Here's an
1226    example, from the alias module, which also indicates how a
1227    syntax error can be returned (note that the per-directory
1228    configuration argument to the command handler is declared as a
1229    dummy, since the module doesn't actually have per-directory
1230    config data):</p>
1231<pre>
1232char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url)
1233{
1234    server_rec *s = cmd-&gt;server;
1235    alias_server_conf *conf = (alias_server_conf *)
1236            ap_get_module_config(s-&gt;module_config,&amp;alias_module);
1237    alias_entry *new = ap_push_array (conf-&gt;redirects);
1238
1239    if (!ap_is_url (url)) return "Redirect to non-URL";
1240
1241    new-&gt;fake = f; new-&gt;real = url;
1242    return NULL;
1243}
1244</pre>
1245        <hr />
1246
1247    <h3 align="CENTER">Apache HTTP Server Version 1.3</h3>
1248    <a href="./"><img src="../images/index.gif" alt="Index" /></a>
1249    <a href="../"><img src="../images/home.gif" alt="Home" /></a>
1250
1251  </body>
1252</html>
1253
1254