1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 3 4<html xmlns="http://www.w3.org/1999/xhtml"> 5 <head> 6 <meta name="generator" content="HTML Tidy, see www.w3.org" /> 7 8 <title>Apache API notes</title> 9 </head> 10 <!-- Background white, links blue (unvisited), navy (visited), red (active) --> 11 12 <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" 13 vlink="#000080" alink="#FF0000"> 14 <div align="CENTER"> 15 <img src="../images/sub.gif" alt="[APACHE DOCUMENTATION]" /> 16 17 <h3>Apache HTTP Server Version 1.3</h3> 18 </div> 19 20 21 <h1 align="CENTER">Apache API notes</h1> 22 These are some notes on the Apache API and the data structures 23 you have to deal with, <em>etc.</em> They are not yet nearly 24 complete, but hopefully, they will help you get your bearings. 25 Keep in mind that the API is still subject to change as we gain 26 experience with it. (See the TODO file for what <em>might</em> 27 be coming). However, it will be easy to adapt modules to any 28 changes that are made. (We have more modules to adapt than you 29 do). 30 31 <p>A few notes on general pedagogical style here. In the 32 interest of conciseness, all structure declarations here are 33 incomplete --- the real ones have more slots that I'm not 34 telling you about. For the most part, these are reserved to one 35 component of the server core or another, and should be altered 36 by modules with caution. However, in some cases, they really 37 are things I just haven't gotten around to yet. Welcome to the 38 bleeding edge.</p> 39 40 <p>Finally, here's an outline, to give you some bare idea of 41 what's coming up, and in what order:</p> 42 43 <ul> 44 <li> 45 <a href="#basics">Basic concepts.</a> 46 47 <ul> 48 <li><a href="#HMR">Handlers, Modules, and 49 Requests</a></li> 50 51 <li><a href="#moduletour">A brief tour of a 52 module</a></li> 53 </ul> 54 </li> 55 56 <li> 57 <a href="#handlers">How handlers work</a> 58 59 <ul> 60 <li><a href="#req_tour">A brief tour of the 61 <code>request_rec</code></a></li> 62 63 <li><a href="#req_orig">Where request_rec structures come 64 from</a></li> 65 66 <li><a href="#req_return">Handling requests, declining, 67 and returning error codes</a></li> 68 69 <li><a href="#resp_handlers">Special considerations for 70 response handlers</a></li> 71 72 <li><a href="#auth_handlers">Special considerations for 73 authentication handlers</a></li> 74 75 <li><a href="#log_handlers">Special considerations for 76 logging handlers</a></li> 77 </ul> 78 </li> 79 80 <li><a href="#pools">Resource allocation and resource 81 pools</a></li> 82 83 <li> 84 <a href="#config">Configuration, commands and the like</a> 85 86 <ul> 87 <li><a href="#per-dir">Per-directory configuration 88 structures</a></li> 89 90 <li><a href="#commands">Command handling</a></li> 91 92 <li><a href="#servconf">Side notes --- per-server 93 configuration, virtual servers, <em>etc</em>.</a></li> 94 </ul> 95 </li> 96 </ul> 97 98 <h2><a id="basics" name="basics">Basic concepts.</a></h2> 99 We begin with an overview of the basic concepts behind the API, 100 and how they are manifested in the code. 101 102 <h3><a id="HMR" name="HMR">Handlers, Modules, and 103 Requests</a></h3> 104 Apache breaks down request handling into a series of steps, 105 more or less the same way the Netscape server API does 106 (although this API has a few more stages than NetSite does, as 107 hooks for stuff I thought might be useful in the future). These 108 are: 109 110 <ul> 111 <li>URI -> Filename translation</li> 112 113 <li>Auth ID checking [is the user who they say they 114 are?]</li> 115 116 <li>Auth access checking [is the user authorized 117 <em>here</em>?]</li> 118 119 <li>Access checking other than auth</li> 120 121 <li>Determining MIME type of the object requested</li> 122 123 <li>`Fixups' --- there aren't any of these yet, but the phase 124 is intended as a hook for possible extensions like 125 <code>SetEnv</code>, which don't really fit well 126 elsewhere.</li> 127 128 <li>Actually sending a response back to the client.</li> 129 130 <li>Logging the request</li> 131 </ul> 132 These phases are handled by looking at each of a succession of 133 <em>modules</em>, looking to see if each of them has a handler 134 for the phase, and attempting invoking it if so. The handler 135 can typically do one of three things: 136 137 <ul> 138 <li><em>Handle</em> the request, and indicate that it has 139 done so by returning the magic constant <code>OK</code>.</li> 140 141 <li><em>Decline</em> to handle the request, by returning the 142 magic integer constant <code>DECLINED</code>. In this case, 143 the server behaves in all respects as if the handler simply 144 hadn't been there.</li> 145 146 <li>Signal an error, by returning one of the HTTP error 147 codes. This terminates normal handling of the request, 148 although an ErrorDocument may be invoked to try to mop up, 149 and it will be logged in any case.</li> 150 </ul> 151 Most phases are terminated by the first module that handles 152 them; however, for logging, `fixups', and non-access 153 authentication checking, all handlers always run (barring an 154 error). Also, the response phase is unique in that modules may 155 declare multiple handlers for it, via a dispatch table keyed on 156 the MIME type of the requested object. Modules may declare a 157 response-phase handler which can handle <em>any</em> request, 158 by giving it the key <code>*/*</code> (<em>i.e.</em>, a 159 wildcard MIME type specification). However, wildcard handlers 160 are only invoked if the server has already tried and failed to 161 find a more specific response handler for the MIME type of the 162 requested object (either none existed, or they all declined). 163 164 <p>The handlers themselves are functions of one argument (a 165 <code>request_rec</code> structure. vide infra), which returns 166 an integer, as above.</p> 167 168 <h3><a id="moduletour" name="moduletour">A brief tour of a 169 module</a></h3> 170 At this point, we need to explain the structure of a module. 171 Our candidate will be one of the messier ones, the CGI module 172 --- this handles both CGI scripts and the 173 <code>ScriptAlias</code> config file command. It's actually a 174 great deal more complicated than most modules, but if we're 175 going to have only one example, it might as well be the one 176 with its fingers in every place. 177 178 <p>Let's begin with handlers. In order to handle the CGI 179 scripts, the module declares a response handler for them. 180 Because of <code>ScriptAlias</code>, it also has handlers for 181 the name translation phase (to recognize 182 <code>ScriptAlias</code>ed URIs), the type-checking phase (any 183 <code>ScriptAlias</code>ed request is typed as a CGI 184 script).</p> 185 186 <p>The module needs to maintain some per (virtual) server 187 information, namely, the <code>ScriptAlias</code>es in effect; 188 the module structure therefore contains pointers to a functions 189 which builds these structures, and to another which combines 190 two of them (in case the main server and a virtual server both 191 have <code>ScriptAlias</code>es declared).</p> 192 193 <p>Finally, this module contains code to handle the 194 <code>ScriptAlias</code> command itself. This particular module 195 only declares one command, but there could be more, so modules 196 have <em>command tables</em> which declare their commands, and 197 describe where they are permitted, and how they are to be 198 invoked.</p> 199 200 <p>A final note on the declared types of the arguments of some 201 of these commands: a <code>pool</code> is a pointer to a 202 <em>resource pool</em> structure; these are used by the server 203 to keep track of the memory which has been allocated, files 204 opened, <em>etc.</em>, either to service a particular request, 205 or to handle the process of configuring itself. That way, when 206 the request is over (or, for the configuration pool, when the 207 server is restarting), the memory can be freed, and the files 208 closed, <em>en masse</em>, without anyone having to write 209 explicit code to track them all down and dispose of them. Also, 210 a <code>cmd_parms</code> structure contains various information 211 about the config file being read, and other status information, 212 which is sometimes of use to the function which processes a 213 config-file command (such as <code>ScriptAlias</code>). With no 214 further ado, the module itself:</p> 215<pre> 216/* Declarations of handlers. */ 217 218int translate_scriptalias (request_rec *); 219int type_scriptalias (request_rec *); 220int cgi_handler (request_rec *); 221 222/* Subsidiary dispatch table for response-phase handlers, by MIME type */ 223 224handler_rec cgi_handlers[] = { 225{ "application/x-httpd-cgi", cgi_handler }, 226{ NULL } 227}; 228 229/* Declarations of routines to manipulate the module's configuration 230 * info. Note that these are returned, and passed in, as void *'s; 231 * the server core keeps track of them, but it doesn't, and can't, 232 * know their internal structure. 233 */ 234 235void *make_cgi_server_config (pool *); 236void *merge_cgi_server_config (pool *, void *, void *); 237 238/* Declarations of routines to handle config-file commands */ 239 240extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake, 241 char *real); 242 243command_rec cgi_cmds[] = { 244{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2, 245 "a fakename and a realname"}, 246{ NULL } 247}; 248 249module cgi_module = { 250 STANDARD_MODULE_STUFF, 251 NULL, /* initializer */ 252 NULL, /* dir config creator */ 253 NULL, /* dir merger --- default is to override */ 254 make_cgi_server_config, /* server config */ 255 merge_cgi_server_config, /* merge server config */ 256 cgi_cmds, /* command table */ 257 cgi_handlers, /* handlers */ 258 translate_scriptalias, /* filename translation */ 259 NULL, /* check_user_id */ 260 NULL, /* check auth */ 261 NULL, /* check access */ 262 type_scriptalias, /* type_checker */ 263 NULL, /* fixups */ 264 NULL, /* logger */ 265 NULL /* header parser */ 266}; 267</pre> 268 269 <h2><a id="handlers" name="handlers">How handlers work</a></h2> 270 The sole argument to handlers is a <code>request_rec</code> 271 structure. This structure describes a particular request which 272 has been made to the server, on behalf of a client. In most 273 cases, each connection to the client generates only one 274 <code>request_rec</code> structure. 275 276 <h3><a id="req_tour" name="req_tour">A brief tour of the 277 <code>request_rec</code></a></h3> 278 The <code>request_rec</code> contains pointers to a resource 279 pool which will be cleared when the server is finished handling 280 the request; to structures containing per-server and 281 per-connection information, and most importantly, information 282 on the request itself. 283 284 <p>The most important such information is a small set of 285 character strings describing attributes of the object being 286 requested, including its URI, filename, content-type and 287 content-encoding (these being filled in by the translation and 288 type-check handlers which handle the request, 289 respectively).</p> 290 291 <p>Other commonly used data items are tables giving the MIME 292 headers on the client's original request, MIME headers to be 293 sent back with the response (which modules can add to at will), 294 and environment variables for any subprocesses which are 295 spawned off in the course of servicing the request. These 296 tables are manipulated using the <code>ap_table_get</code> and 297 <code>ap_table_set</code> routines.</p> 298 299 <blockquote> 300 Note that the <samp>Content-type</samp> header value 301 <em>cannot</em> be set by module content-handlers using the 302 <samp>ap_table_*()</samp> routines. Rather, it is set by 303 pointing the <samp>content_type</samp> field in the 304 <samp>request_rec</samp> structure to an appropriate string. 305 <em>E.g.</em>, 306<pre> 307 r->content_type = "text/html"; 308</pre> 309 </blockquote> 310 Finally, there are pointers to two data structures which, in 311 turn, point to per-module configuration structures. 312 Specifically, these hold pointers to the data structures which 313 the module has built to describe the way it has been configured 314 to operate in a given directory (via <code>.htaccess</code> 315 files or <code><Directory></code> sections), for private 316 data it has built in the course of servicing the request (so 317 modules' handlers for one phase can pass `notes' to their 318 handlers for other phases). There is another such configuration 319 vector in the <code>server_rec</code> data structure pointed to 320 by the <code>request_rec</code>, which contains per (virtual) 321 server configuration data. 322 323 <p>Here is an abridged declaration, giving the fields most 324 commonly used:</p> 325<pre> 326struct request_rec { 327 328 pool *pool; 329 conn_rec *connection; 330 server_rec *server; 331 332 /* What object is being requested */ 333 334 char *uri; 335 char *filename; 336 char *path_info; 337 char *args; /* QUERY_ARGS, if any */ 338 struct stat finfo; /* Set by server core; 339 * st_mode set to zero if no such file */ 340 341 char *content_type; 342 char *content_encoding; 343 344 /* MIME header environments, in and out. Also, an array containing 345 * environment variables to be passed to subprocesses, so people can 346 * write modules to add to that environment. 347 * 348 * The difference between headers_out and err_headers_out is that 349 * the latter are printed even on error, and persist across internal 350 * redirects (so the headers printed for ErrorDocument handlers will 351 * have them). 352 */ 353 354 table *headers_in; 355 table *headers_out; 356 table *err_headers_out; 357 table *subprocess_env; 358 359 /* Info about the request itself... */ 360 361 int header_only; /* HEAD request, as opposed to GET */ 362 char *protocol; /* Protocol, as given to us, or HTTP/0.9 */ 363 char *method; /* GET, HEAD, POST, <em>etc.</em> */ 364 int method_number; /* M_GET, M_POST, <em>etc.</em> */ 365 366 /* Info for logging */ 367 368 char *the_request; 369 int bytes_sent; 370 371 /* A flag which modules can set, to indicate that the data being 372 * returned is volatile, and clients should be told not to cache it. 373 */ 374 375 int no_cache; 376 377 /* Various other config info which may change with .htaccess files 378 * These are config vectors, with one void* pointer for each module 379 * (the thing pointed to being the module's business). 380 */ 381 382 void *per_dir_config; /* Options set in config files, <em>etc.</em> */ 383 void *request_config; /* Notes on *this* request */ 384 385}; 386 387</pre> 388 389 <h3><a id="req_orig" name="req_orig">Where request_rec 390 structures come from</a></h3> 391 Most <code>request_rec</code> structures are built by reading 392 an HTTP request from a client, and filling in the fields. 393 However, there are a few exceptions: 394 395 <ul> 396 <li>If the request is to an imagemap, a type map 397 (<em>i.e.</em>, a <code>*.var</code> file), or a CGI script 398 which returned a local `Location:', then the resource which 399 the user requested is going to be ultimately located by some 400 URI other than what the client originally supplied. In this 401 case, the server does an <em>internal redirect</em>, 402 constructing a new <code>request_rec</code> for the new URI, 403 and processing it almost exactly as if the client had 404 requested the new URI directly.</li> 405 406 <li>If some handler signaled an error, and an 407 <code>ErrorDocument</code> is in scope, the same internal 408 redirect machinery comes into play.</li> 409 410 <li> 411 Finally, a handler occasionally needs to investigate `what 412 would happen if' some other request were run. For instance, 413 the directory indexing module needs to know what MIME type 414 would be assigned to a request for each directory entry, in 415 order to figure out what icon to use. 416 417 <p>Such handlers can construct a <em>sub-request</em>, 418 using the functions <code>ap_sub_req_lookup_file</code>, 419 <code>ap_sub_req_lookup_uri</code>, and 420 <code>ap_sub_req_method_uri</code>; these construct a new 421 <code>request_rec</code> structure and processes it as you 422 would expect, up to but not including the point of actually 423 sending a response. (These functions skip over the access 424 checks if the sub-request is for a file in the same 425 directory as the original request).</p> 426 427 <p>(Server-side includes work by building sub-requests and 428 then actually invoking the response handler for them, via 429 the function <code>ap_run_sub_req</code>).</p> 430 </li> 431 </ul> 432 433 <h3><a id="req_return" name="req_return">Handling requests, 434 declining, and returning error codes</a></h3> 435 As discussed above, each handler, when invoked to handle a 436 particular <code>request_rec</code>, has to return an 437 <code>int</code> to indicate what happened. That can either be 438 439 <ul> 440 <li>OK --- the request was handled successfully. This may or 441 may not terminate the phase.</li> 442 443 <li>DECLINED --- no erroneous condition exists, but the 444 module declines to handle the phase; the server tries to find 445 another.</li> 446 447 <li>an HTTP error code, which aborts handling of the 448 request.</li> 449 </ul> 450 Note that if the error code returned is <code>REDIRECT</code>, 451 then the module should put a <code>Location</code> in the 452 request's <code>headers_out</code>, to indicate where the 453 client should be redirected <em>to</em>. 454 455 <h3><a id="resp_handlers" name="resp_handlers">Special 456 considerations for response handlers</a></h3> 457 Handlers for most phases do their work by simply setting a few 458 fields in the <code>request_rec</code> structure (or, in the 459 case of access checkers, simply by returning the correct error 460 code). However, response handlers have to actually send a 461 request back to the client. 462 463 <p>They should begin by sending an HTTP response header, using 464 the function <code>ap_send_http_header</code>. (You don't have 465 to do anything special to skip sending the header for HTTP/0.9 466 requests; the function figures out on its own that it shouldn't 467 do anything). If the request is marked 468 <code>header_only</code>, that's all they should do; they 469 should return after that, without attempting any further 470 output.</p> 471 472 <p>Otherwise, they should produce a request body which responds 473 to the client as appropriate. The primitives for this are 474 <code>ap_rputc</code> and <code>ap_rprintf</code>, for 475 internally generated output, and <code>ap_send_fd</code>, to 476 copy the contents of some <code>FILE *</code> straight to the 477 client.</p> 478 479 <p>At this point, you should more or less understand the 480 following piece of code, which is the handler which handles 481 <code>GET</code> requests which have no more specific handler; 482 it also shows how conditional <code>GET</code>s can be handled, 483 if it's desirable to do so in a particular response handler --- 484 <code>ap_set_last_modified</code> checks against the 485 <code>If-modified-since</code> value supplied by the client, if 486 any, and returns an appropriate code (which will, if nonzero, 487 be USE_LOCAL_COPY). No similar considerations apply for 488 <code>ap_set_content_length</code>, but it returns an error 489 code for symmetry.</p> 490<pre> 491int default_handler (request_rec *r) 492{ 493 int errstatus; 494 FILE *f; 495 496 if (r->method_number != M_GET) return DECLINED; 497 if (r->finfo.st_mode == 0) return NOT_FOUND; 498 499 if ((errstatus = ap_set_content_length (r, r->finfo.st_size))) { 500 return errstatus; 501 } 502 503 r->mtime = r->finfo.st_mtime; 504 ap_set_last_modified (r); 505 506 f = ap_pfopen (r->pool, r->filename, "r"); 507 508 if (f == NULL) { 509 ap_log_rerror(APLOG_MARK, APLOG_ERR, r, 510 "file permissions deny server access: %s", r->filename); 511 return FORBIDDEN; 512 } 513 514 ap_soft_timeout ("send", r); 515 ap_send_http_header (r); 516 517 if (!r->header_only) ap_send_fd (f, r); 518 ap_pfclose (r->pool, f); 519 520 ap_kill_timeout (r); 521 return OK; 522} 523</pre> 524 Finally, if all of this is too much of a challenge, there are a 525 few ways out of it. First off, as shown above, a response 526 handler which has not yet produced any output can simply return 527 an error code, in which case the server will automatically 528 produce an error response. Secondly, it can punt to some other 529 handler by invoking <code>ap_internal_redirect</code>, which is 530 how the internal redirection machinery discussed above is 531 invoked. A response handler which has internally redirected 532 should always return <code>OK</code>. 533 534 <p>(Invoking <code>ap_internal_redirect</code> from handlers 535 which are <em>not</em> response handlers will lead to serious 536 confusion).</p> 537 538 <h3><a id="auth_handlers" name="auth_handlers">Special 539 considerations for authentication handlers</a></h3> 540 Stuff that should be discussed here in detail: 541 542 <ul> 543 <li>Authentication-phase handlers not invoked unless auth is 544 configured for the directory.</li> 545 546 <li>Common auth configuration stored in the core per-dir 547 configuration; it has accessors <code>ap_auth_type</code>, 548 <code>ap_auth_name</code>, and <code>ap_requires</code>.</li> 549 550 <li>Common routines, to handle the protocol end of things, at 551 least for HTTP basic authentication 552 (<code>ap_get_basic_auth_pw</code>, which sets the 553 <code>connection->user</code> structure field 554 automatically, and <code>ap_note_basic_auth_failure</code>, 555 which arranges for the proper <code>WWW-Authenticate:</code> 556 header to be sent back).</li> 557 </ul> 558 559 <h3><a id="log_handlers" name="log_handlers">Special 560 considerations for logging handlers</a></h3> 561 When a request has internally redirected, there is the question 562 of what to log. Apache handles this by bundling the entire 563 chain of redirects into a list of <code>request_rec</code> 564 structures which are threaded through the 565 <code>r->prev</code> and <code>r->next</code> pointers. 566 The <code>request_rec</code> which is passed to the logging 567 handlers in such cases is the one which was originally built 568 for the initial request from the client; note that the 569 bytes_sent field will only be correct in the last request in 570 the chain (the one for which a response was actually sent). 571 572 <h2><a id="pools" name="pools">Resource allocation and resource 573 pools</a></h2> 574 575 <p>One of the problems of writing and designing a server-pool 576 server is that of preventing leakage, that is, allocating 577 resources (memory, open files, <em>etc.</em>), without 578 subsequently releasing them. The resource pool machinery is 579 designed to make it easy to prevent this from happening, by 580 allowing resource to be allocated in such a way that they are 581 <em>automatically</em> released when the server is done with 582 them.</p> 583 584 <p>The way this works is as follows: the memory which is 585 allocated, file opened, <em>etc.</em>, to deal with a 586 particular request are tied to a <em>resource pool</em> which 587 is allocated for the request. The pool is a data structure 588 which itself tracks the resources in question.</p> 589 590 <p>When the request has been processed, the pool is 591 <em>cleared</em>. At that point, all the memory associated with 592 it is released for reuse, all files associated with it are 593 closed, and any other clean-up functions which are associated 594 with the pool are run. When this is over, we can be confident 595 that all the resource tied to the pool have been released, and 596 that none of them have leaked.</p> 597 598 <p>Server restarts, and allocation of memory and resources for 599 per-server configuration, are handled in a similar way. There 600 is a <em>configuration pool</em>, which keeps track of 601 resources which were allocated while reading the server 602 configuration files, and handling the commands therein (for 603 instance, the memory that was allocated for per-server module 604 configuration, log files and other files that were opened, and 605 so forth). When the server restarts, and has to reread the 606 configuration files, the configuration pool is cleared, and so 607 the memory and file descriptors which were taken up by reading 608 them the last time are made available for reuse.</p> 609 610 <p>It should be noted that use of the pool machinery isn't 611 generally obligatory, except for situations like logging 612 handlers, where you really need to register cleanups to make 613 sure that the log file gets closed when the server restarts 614 (this is most easily done by using the function <code><a 615 href="#pool-files">ap_pfopen</a></code>, which also arranges 616 for the underlying file descriptor to be closed before any 617 child processes, such as for CGI scripts, are 618 <code>exec</code>ed), or in case you are using the timeout 619 machinery (which isn't yet even documented here). However, 620 there are two benefits to using it: resources allocated to a 621 pool never leak (even if you allocate a scratch string, and 622 just forget about it); also, for memory allocation, 623 <code>ap_palloc</code> is generally faster than 624 <code>malloc</code>.</p> 625 626 <p>We begin here by describing how memory is allocated to 627 pools, and then discuss how other resources are tracked by the 628 resource pool machinery.</p> 629 630 <h3>Allocation of memory in pools</h3> 631 632 <p>Memory is allocated to pools by calling the function 633 <code>ap_palloc</code>, which takes two arguments, one being a 634 pointer to a resource pool structure, and the other being the 635 amount of memory to allocate (in <code>char</code>s). Within 636 handlers for handling requests, the most common way of getting 637 a resource pool structure is by looking at the 638 <code>pool</code> slot of the relevant 639 <code>request_rec</code>; hence the repeated appearance of the 640 following idiom in module code:</p> 641<pre> 642int my_handler(request_rec *r) 643{ 644 struct my_structure *foo; 645 ... 646 647 foo = (foo *)ap_palloc (r->pool, sizeof(my_structure)); 648} 649</pre> 650 651 <p>Note that <em>there is no <code>ap_pfree</code></em> --- 652 <code>ap_palloc</code>ed memory is freed only when the 653 associated resource pool is cleared. This means that 654 <code>ap_palloc</code> does not have to do as much accounting 655 as <code>malloc()</code>; all it does in the typical case is to 656 round up the size, bump a pointer, and do a range check.</p> 657 658 <p>(It also raises the possibility that heavy use of 659 <code>ap_palloc</code> could cause a server process to grow 660 excessively large. There are two ways to deal with this, which 661 are dealt with below; briefly, you can use <code>malloc</code>, 662 and try to be sure that all of the memory gets explicitly 663 <code>free</code>d, or you can allocate a sub-pool of the main 664 pool, allocate your memory in the sub-pool, and clear it out 665 periodically. The latter technique is discussed in the section 666 on sub-pools below, and is used in the directory-indexing code, 667 in order to avoid excessive storage allocation when listing 668 directories with thousands of files).</p> 669 670 <h3>Allocating initialized memory</h3> 671 672 <p>There are functions which allocate initialized memory, and 673 are frequently useful. The function <code>ap_pcalloc</code> has 674 the same interface as <code>ap_palloc</code>, but clears out 675 the memory it allocates before it returns it. The function 676 <code>ap_pstrdup</code> takes a resource pool and a <code>char 677 *</code> as arguments, and allocates memory for a copy of the 678 string the pointer points to, returning a pointer to the copy. 679 Finally <code>ap_pstrcat</code> is a varargs-style function, 680 which takes a pointer to a resource pool, and at least two 681 <code>char *</code> arguments, the last of which must be 682 <code>NULL</code>. It allocates enough memory to fit copies of 683 each of the strings, as a unit; for instance:</p> 684<pre> 685 ap_pstrcat (r->pool, "foo", "/", "bar", NULL); 686</pre> 687 688 <p>returns a pointer to 8 bytes worth of memory, initialized to 689 <code>"foo/bar"</code>.</p> 690 691 <h3><a id="pools-used" name="pools-used">Commonly-used pools in 692 the Apache Web server</a></h3> 693 694 <p>A pool is really defined by its lifetime more than anything 695 else. There are some static pools in http_main which are passed 696 to various non-http_main functions as arguments at opportune 697 times. Here they are:</p> 698 699 <dl compact="compact"> 700 <dt>permanent_pool</dt> 701 702 <dd> 703 <ul> 704 <li>never passed to anything else, this is the ancestor 705 of all pools</li> 706 </ul> 707 </dd> 708 709 <dt>pconf</dt> 710 711 <dd> 712 <ul> 713 <li>subpool of permanent_pool</li> 714 715 <li>created at the beginning of a config "cycle"; exists 716 until the server is terminated or restarts; passed to all 717 config-time routines, either via cmd->pool, or as the 718 "pool *p" argument on those which don't take pools</li> 719 720 <li>passed to the module init() functions</li> 721 </ul> 722 </dd> 723 724 <dt>ptemp</dt> 725 726 <dd> 727 <ul> 728 <li>sorry I lie, this pool isn't called this currently in 729 1.3, I renamed it this in my pthreads development. I'm 730 referring to the use of ptrans in the parent... contrast 731 this with the later definition of ptrans in the 732 child.</li> 733 734 <li>subpool of permanent_pool</li> 735 736 <li>created at the beginning of a config "cycle"; exists 737 until the end of config parsing; passed to config-time 738 routines <em>via</em> cmd->temp_pool. Somewhat of a 739 "bastard child" because it isn't available everywhere. 740 Used for temporary scratch space which may be needed by 741 some config routines but which is deleted at the end of 742 config.</li> 743 </ul> 744 </dd> 745 746 <dt>pchild</dt> 747 748 <dd> 749 <ul> 750 <li>subpool of permanent_pool</li> 751 752 <li>created when a child is spawned (or a thread is 753 created); lives until that child (thread) is 754 destroyed</li> 755 756 <li>passed to the module child_init functions</li> 757 758 <li>destruction happens right after the child_exit 759 functions are called... (which may explain why I think 760 child_exit is redundant and unneeded)</li> 761 </ul> 762 </dd> 763 764 <dt>ptrans</dt> 765 766 <dd> 767 <ul> 768 <li>should be a subpool of pchild, but currently is a 769 subpool of permanent_pool, see above</li> 770 771 <li>cleared by the child before going into the accept() 772 loop to receive a connection</li> 773 774 <li>used as connection->pool</li> 775 </ul> 776 </dd> 777 778 <dt>r->pool</dt> 779 780 <dd> 781 <ul> 782 <li>for the main request this is a subpool of 783 connection->pool; for subrequests it is a subpool of 784 the parent request's pool.</li> 785 786 <li>exists until the end of the request (<em>i.e.</em>, 787 ap_destroy_sub_req, or in child_main after 788 process_request has finished)</li> 789 790 <li>note that r itself is allocated from r->pool; 791 <em>i.e.</em>, r->pool is first created and then r is 792 the first thing palloc()d from it</li> 793 </ul> 794 </dd> 795 </dl> 796 797 <p>For almost everything folks do, r->pool is the pool to 798 use. But you can see how other lifetimes, such as pchild, are 799 useful to some modules... such as modules that need to open a 800 database connection once per child, and wish to clean it up 801 when the child dies.</p> 802 803 <p>You can also see how some bugs have manifested themself, 804 such as setting connection->user to a value from r->pool 805 -- in this case connection exists for the lifetime of ptrans, 806 which is longer than r->pool (especially if r->pool is a 807 subrequest!). So the correct thing to do is to allocate from 808 connection->pool.</p> 809 810 <p>And there was another interesting bug in 811 mod_include/mod_cgi. You'll see in those that they do this test 812 to decide if they should use r->pool or r->main->pool. 813 In this case the resource that they are registering for cleanup 814 is a child process. If it were registered in r->pool, then 815 the code would wait() for the child when the subrequest 816 finishes. With mod_include this could be any old #include, and 817 the delay can be up to 3 seconds... and happened quite 818 frequently. Instead the subprocess is registered in 819 r->main->pool which causes it to be cleaned up when the 820 entire request is done -- <em>i.e.</em>, after the output has 821 been sent to the client and logging has happened.</p> 822 823 <h3><a id="pool-files" name="pool-files">Tracking open files, 824 etc.</a></h3> 825 826 <p>As indicated above, resource pools are also used to track 827 other sorts of resources besides memory. The most common are 828 open files. The routine which is typically used for this is 829 <code>ap_pfopen</code>, which takes a resource pool and two 830 strings as arguments; the strings are the same as the typical 831 arguments to <code>fopen</code>, <em>e.g.</em>,</p> 832<pre> 833 ... 834 FILE *f = ap_pfopen (r->pool, r->filename, "r"); 835 836 if (f == NULL) { ... } else { ... } 837</pre> 838 839 <p>There is also a <code>ap_popenf</code> routine, which 840 parallels the lower-level <code>open</code> system call. Both 841 of these routines arrange for the file to be closed when the 842 resource pool in question is cleared.</p> 843 844 <p>Unlike the case for memory, there <em>are</em> functions to 845 close files allocated with <code>ap_pfopen</code>, and 846 <code>ap_popenf</code>, namely <code>ap_pfclose</code> and 847 <code>ap_pclosef</code>. (This is because, on many systems, the 848 number of files which a single process can have open is quite 849 limited). It is important to use these functions to close files 850 allocated with <code>ap_pfopen</code> and 851 <code>ap_popenf</code>, since to do otherwise could cause fatal 852 errors on systems such as Linux, which react badly if the same 853 <code>FILE*</code> is closed more than once.</p> 854 855 <p>(Using the <code>close</code> functions is not mandatory, 856 since the file will eventually be closed regardless, but you 857 should consider it in cases where your module is opening, or 858 could open, a lot of files).</p> 859 860 <h3>Other sorts of resources --- cleanup functions</h3> 861 862 <blockquote> 863 More text goes here. Describe the the cleanup primitives in 864 terms of which the file stuff is implemented; also, 865 <code>spawn_process</code>. 866 </blockquote> 867 868 <p>Pool cleanups live until clear_pool() is called: 869 clear_pool(a) recursively calls destroy_pool() on all subpools 870 of a; then calls all the cleanups for a; then releases all the 871 memory for a. destroy_pool(a) calls clear_pool(a) and then 872 releases the pool structure itself. <em>i.e.</em>, 873 clear_pool(a) doesn't delete a, it just frees up all the 874 resources and you can start using it again immediately.</p> 875 876 <h3>Fine control --- creating and dealing with sub-pools, with 877 a note on sub-requests</h3> 878 On rare occasions, too-free use of <code>ap_palloc()</code> and 879 the associated primitives may result in undesirably profligate 880 resource allocation. You can deal with such a case by creating 881 a <em>sub-pool</em>, allocating within the sub-pool rather than 882 the main pool, and clearing or destroying the sub-pool, which 883 releases the resources which were associated with it. (This 884 really <em>is</em> a rare situation; the only case in which it 885 comes up in the standard module set is in case of listing 886 directories, and then only with <em>very</em> large 887 directories. Unnecessary use of the primitives discussed here 888 can hair up your code quite a bit, with very little gain). 889 890 <p>The primitive for creating a sub-pool is 891 <code>ap_make_sub_pool</code>, which takes another pool (the 892 parent pool) as an argument. When the main pool is cleared, the 893 sub-pool will be destroyed. The sub-pool may also be cleared or 894 destroyed at any time, by calling the functions 895 <code>ap_clear_pool</code> and <code>ap_destroy_pool</code>, 896 respectively. (The difference is that 897 <code>ap_clear_pool</code> frees resources associated with the 898 pool, while <code>ap_destroy_pool</code> also deallocates the 899 pool itself. In the former case, you can allocate new resources 900 within the pool, and clear it again, and so forth; in the 901 latter case, it is simply gone).</p> 902 903 <p>One final note --- sub-requests have their own resource 904 pools, which are sub-pools of the resource pool for the main 905 request. The polite way to reclaim the resources associated 906 with a sub request which you have allocated (using the 907 <code>ap_sub_req_...</code> functions) is 908 <code>ap_destroy_sub_req</code>, which frees the resource pool. 909 Before calling this function, be sure to copy anything that you 910 care about which might be allocated in the sub-request's 911 resource pool into someplace a little less volatile (for 912 instance, the filename in its <code>request_rec</code> 913 structure).</p> 914 915 <p>(Again, under most circumstances, you shouldn't feel obliged 916 to call this function; only 2K of memory or so are allocated 917 for a typical sub request, and it will be freed anyway when the 918 main request pool is cleared. It is only when you are 919 allocating many, many sub-requests for a single main request 920 that you should seriously consider the 921 <code>ap_destroy_...</code> functions).</p> 922 923 <h2><a id="config" name="config">Configuration, commands and 924 the like</a></h2> 925 One of the design goals for this server was to maintain 926 external compatibility with the NCSA 1.3 server --- that is, to 927 read the same configuration files, to process all the 928 directives therein correctly, and in general to be a drop-in 929 replacement for NCSA. On the other hand, another design goal 930 was to move as much of the server's functionality into modules 931 which have as little as possible to do with the monolithic 932 server core. The only way to reconcile these goals is to move 933 the handling of most commands from the central server into the 934 modules. 935 936 <p>However, just giving the modules command tables is not 937 enough to divorce them completely from the server core. The 938 server has to remember the commands in order to act on them 939 later. That involves maintaining data which is private to the 940 modules, and which can be either per-server, or per-directory. 941 Most things are per-directory, including in particular access 942 control and authorization information, but also information on 943 how to determine file types from suffixes, which can be 944 modified by <code>AddType</code> and <code>DefaultType</code> 945 directives, and so forth. In general, the governing philosophy 946 is that anything which <em>can</em> be made configurable by 947 directory should be; per-server information is generally used 948 in the standard set of modules for information like 949 <code>Alias</code>es and <code>Redirect</code>s which come into 950 play before the request is tied to a particular place in the 951 underlying file system.</p> 952 953 <p>Another requirement for emulating the NCSA server is being 954 able to handle the per-directory configuration files, generally 955 called <code>.htaccess</code> files, though even in the NCSA 956 server they can contain directives which have nothing at all to 957 do with access control. Accordingly, after URI -> filename 958 translation, but before performing any other phase, the server 959 walks down the directory hierarchy of the underlying 960 filesystem, following the translated pathname, to read any 961 <code>.htaccess</code> files which might be present. The 962 information which is read in then has to be <em>merged</em> 963 with the applicable information from the server's own config 964 files (either from the <code><Directory></code> sections 965 in <code>access.conf</code>, or from defaults in 966 <code>srm.conf</code>, which actually behaves for most purposes 967 almost exactly like <code><Directory /></code>).</p> 968 969 <p>Finally, after having served a request which involved 970 reading <code>.htaccess</code> files, we need to discard the 971 storage allocated for handling them. That is solved the same 972 way it is solved wherever else similar problems come up, by 973 tying those structures to the per-transaction resource 974 pool.</p> 975 976 <h3><a id="per-dir" name="per-dir">Per-directory configuration 977 structures</a></h3> 978 Let's look out how all of this plays out in 979 <code>mod_mime.c</code>, which defines the file typing handler 980 which emulates the NCSA server's behavior of determining file 981 types from suffixes. What we'll be looking at, here, is the 982 code which implements the <code>AddType</code> and 983 <code>AddEncoding</code> commands. These commands can appear in 984 <code>.htaccess</code> files, so they must be handled in the 985 module's private per-directory data, which in fact, consists of 986 two separate <code>table</code>s for MIME types and encoding 987 information, and is declared as follows: 988<pre> 989typedef struct { 990 table *forced_types; /* Additional AddTyped stuff */ 991 table *encoding_types; /* Added with AddEncoding... */ 992} mime_dir_config; 993</pre> 994 When the server is reading a configuration file, or 995 <code><Directory></code> section, which includes one of 996 the MIME module's commands, it needs to create a 997 <code>mime_dir_config</code> structure, so those commands have 998 something to act on. It does this by invoking the function it 999 finds in the module's `create per-dir config slot', with two 1000 arguments: the name of the directory to which this 1001 configuration information applies (or <code>NULL</code> for 1002 <code>srm.conf</code>), and a pointer to a resource pool in 1003 which the allocation should happen. 1004 1005 <p>(If we are reading a <code>.htaccess</code> file, that 1006 resource pool is the per-request resource pool for the request; 1007 otherwise it is a resource pool which is used for configuration 1008 data, and cleared on restarts. Either way, it is important for 1009 the structure being created to vanish when the pool is cleared, 1010 by registering a cleanup on the pool if necessary).</p> 1011 1012 <p>For the MIME module, the per-dir config creation function 1013 just <code>ap_palloc</code>s the structure above, and a creates 1014 a couple of <code>table</code>s to fill it. That looks like 1015 this:</p> 1016<pre> 1017void *create_mime_dir_config (pool *p, char *dummy) 1018{ 1019 mime_dir_config *new = 1020 (mime_dir_config *) ap_palloc (p, sizeof(mime_dir_config)); 1021 1022 new->forced_types = ap_make_table (p, 4); 1023 new->encoding_types = ap_make_table (p, 4); 1024 1025 return new; 1026} 1027</pre> 1028 Now, suppose we've just read in a <code>.htaccess</code> file. 1029 We already have the per-directory configuration structure for 1030 the next directory up in the hierarchy. If the 1031 <code>.htaccess</code> file we just read in didn't have any 1032 <code>AddType</code> or <code>AddEncoding</code> commands, its 1033 per-directory config structure for the MIME module is still 1034 valid, and we can just use it. Otherwise, we need to merge the 1035 two structures somehow. 1036 1037 <p>To do that, the server invokes the module's per-directory 1038 config merge function, if one is present. That function takes 1039 three arguments: the two structures being merged, and a 1040 resource pool in which to allocate the result. For the MIME 1041 module, all that needs to be done is overlay the tables from 1042 the new per-directory config structure with those from the 1043 parent:</p> 1044<pre> 1045void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv) 1046{ 1047 mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv; 1048 mime_dir_config *subdir = (mime_dir_config *)subdirv; 1049 mime_dir_config *new = 1050 (mime_dir_config *)ap_palloc (p, sizeof(mime_dir_config)); 1051 1052 new->forced_types = ap_overlay_tables (p, subdir->forced_types, 1053 parent_dir->forced_types); 1054 new->encoding_types = ap_overlay_tables (p, subdir->encoding_types, 1055 parent_dir->encoding_types); 1056 1057 return new; 1058} 1059</pre> 1060 As a note --- if there is no per-directory merge function 1061 present, the server will just use the subdirectory's 1062 configuration info, and ignore the parent's. For some modules, 1063 that works just fine (<em>e.g.</em>, for the includes module, 1064 whose per-directory configuration information consists solely 1065 of the state of the <code>XBITHACK</code>), and for those 1066 modules, you can just not declare one, and leave the 1067 corresponding structure slot in the module itself 1068 <code>NULL</code>. 1069 1070 <h3><a id="commands" name="commands">Command handling</a></h3> 1071 Now that we have these structures, we need to be able to figure 1072 out how to fill them. That involves processing the actual 1073 <code>AddType</code> and <code>AddEncoding</code> commands. To 1074 find commands, the server looks in the module's <code>command 1075 table</code>. That table contains information on how many 1076 arguments the commands take, and in what formats, where it is 1077 permitted, and so forth. That information is sufficient to 1078 allow the server to invoke most command-handling functions with 1079 pre-parsed arguments. Without further ado, let's look at the 1080 <code>AddType</code> command handler, which looks like this 1081 (the <code>AddEncoding</code> command looks basically the same, 1082 and won't be shown here): 1083<pre> 1084char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext) 1085{ 1086 if (*ext == '.') ++ext; 1087 ap_table_set (m->forced_types, ext, ct); 1088 return NULL; 1089} 1090</pre> 1091 This command handler is unusually simple. As you can see, it 1092 takes four arguments, two of which are pre-parsed arguments, 1093 the third being the per-directory configuration structure for 1094 the module in question, and the fourth being a pointer to a 1095 <code>cmd_parms</code> structure. That structure contains a 1096 bunch of arguments which are frequently of use to some, but not 1097 all, commands, including a resource pool (from which memory can 1098 be allocated, and to which cleanups should be tied), and the 1099 (virtual) server being configured, from which the module's 1100 per-server configuration data can be obtained if required. 1101 1102 <p>Another way in which this particular command handler is 1103 unusually simple is that there are no error conditions which it 1104 can encounter. If there were, it could return an error message 1105 instead of <code>NULL</code>; this causes an error to be 1106 printed out on the server's <code>stderr</code>, followed by a 1107 quick exit, if it is in the main config files; for a 1108 <code>.htaccess</code> file, the syntax error is logged in the 1109 server error log (along with an indication of where it came 1110 from), and the request is bounced with a server error response 1111 (HTTP error status, code 500).</p> 1112 1113 <p>The MIME module's command table has entries for these 1114 commands, which look like this:</p> 1115<pre> 1116command_rec mime_cmds[] = { 1117{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2, 1118 "a mime type followed by a file extension" }, 1119{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2, 1120 "an encoding (<em>e.g.</em>, gzip), followed by a file extension" }, 1121{ NULL } 1122}; 1123</pre> 1124 The entries in these tables are: 1125 1126 <ul> 1127 <li>The name of the command</li> 1128 1129 <li>The function which handles it</li> 1130 1131 <li>a <code>(void *)</code> pointer, which is passed in the 1132 <code>cmd_parms</code> structure to the command handler --- 1133 this is useful in case many similar commands are handled by 1134 the same function.</li> 1135 1136 <li>A bit mask indicating where the command may appear. There 1137 are mask bits corresponding to each 1138 <code>AllowOverride</code> option, and an additional mask 1139 bit, <code>RSRC_CONF</code>, indicating that the command may 1140 appear in the server's own config files, but <em>not</em> in 1141 any <code>.htaccess</code> file.</li> 1142 1143 <li>A flag indicating how many arguments the command handler 1144 wants pre-parsed, and how they should be passed in. 1145 <code>TAKE2</code> indicates two pre-parsed arguments. Other 1146 options are <code>TAKE1</code>, which indicates one 1147 pre-parsed argument, <code>FLAG</code>, which indicates that 1148 the argument should be <code>On</code> or <code>Off</code>, 1149 and is passed in as a boolean flag, <code>RAW_ARGS</code>, 1150 which causes the server to give the command the raw, unparsed 1151 arguments (everything but the command name itself). There is 1152 also <code>ITERATE</code>, which means that the handler looks 1153 the same as <code>TAKE1</code>, but that if multiple 1154 arguments are present, it should be called multiple times, 1155 and finally <code>ITERATE2</code>, which indicates that the 1156 command handler looks like a <code>TAKE2</code>, but if more 1157 arguments are present, then it should be called multiple 1158 times, holding the first argument constant.</li> 1159 1160 <li>Finally, we have a string which describes the arguments 1161 that should be present. If the arguments in the actual config 1162 file are not as required, this string will be used to help 1163 give a more specific error message. (You can safely leave 1164 this <code>NULL</code>).</li> 1165 </ul> 1166 Finally, having set this all up, we have to use it. This is 1167 ultimately done in the module's handlers, specifically for its 1168 file-typing handler, which looks more or less like this; note 1169 that the per-directory configuration structure is extracted 1170 from the <code>request_rec</code>'s per-directory configuration 1171 vector by using the <code>ap_get_module_config</code> function. 1172 1173<pre> 1174int find_ct(request_rec *r) 1175{ 1176 int i; 1177 char *fn = ap_pstrdup (r->pool, r->filename); 1178 mime_dir_config *conf = (mime_dir_config *) 1179 ap_get_module_config(r->per_dir_config, &mime_module); 1180 char *type; 1181 1182 if (S_ISDIR(r->finfo.st_mode)) { 1183 r->content_type = DIR_MAGIC_TYPE; 1184 return OK; 1185 } 1186 1187 if((i=ap_rind(fn,'.')) < 0) return DECLINED; 1188 ++i; 1189 1190 if ((type = ap_table_get (conf->encoding_types, &fn[i]))) 1191 { 1192 r->content_encoding = type; 1193 1194 /* go back to previous extension to try to use it as a type */ 1195 1196 fn[i-1] = '\0'; 1197 if((i=ap_rind(fn,'.')) < 0) return OK; 1198 ++i; 1199 } 1200 1201 if ((type = ap_table_get (conf->forced_types, &fn[i]))) 1202 { 1203 r->content_type = type; 1204 } 1205 1206 return OK; 1207} 1208 1209</pre> 1210 1211 <h3><a id="servconf" name="servconf">Side notes --- per-server 1212 configuration, virtual servers, <em>etc</em>.</a></h3> 1213 The basic ideas behind per-server module configuration are 1214 basically the same as those for per-directory configuration; 1215 there is a creation function and a merge function, the latter 1216 being invoked where a virtual server has partially overridden 1217 the base server configuration, and a combined structure must be 1218 computed. (As with per-directory configuration, the default if 1219 no merge function is specified, and a module is configured in 1220 some virtual server, is that the base configuration is simply 1221 ignored). 1222 1223 <p>The only substantial difference is that when a command needs 1224 to configure the per-server private module data, it needs to go 1225 to the <code>cmd_parms</code> data to get at it. Here's an 1226 example, from the alias module, which also indicates how a 1227 syntax error can be returned (note that the per-directory 1228 configuration argument to the command handler is declared as a 1229 dummy, since the module doesn't actually have per-directory 1230 config data):</p> 1231<pre> 1232char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url) 1233{ 1234 server_rec *s = cmd->server; 1235 alias_server_conf *conf = (alias_server_conf *) 1236 ap_get_module_config(s->module_config,&alias_module); 1237 alias_entry *new = ap_push_array (conf->redirects); 1238 1239 if (!ap_is_url (url)) return "Redirect to non-URL"; 1240 1241 new->fake = f; new->real = url; 1242 return NULL; 1243} 1244</pre> 1245 <hr /> 1246 1247 <h3 align="CENTER">Apache HTTP Server Version 1.3</h3> 1248 <a href="./"><img src="../images/index.gif" alt="Index" /></a> 1249 <a href="../"><img src="../images/home.gif" alt="Home" /></a> 1250 1251 </body> 1252</html> 1253 1254