summaryrefslogtreecommitdiffstats
path: root/docs/manual/caching.xml
diff options
context:
space:
mode:
authorGraham Leggett <minfrin@apache.org>2011-12-06 01:54:08 +0100
committerGraham Leggett <minfrin@apache.org>2011-12-06 01:54:08 +0100
commitcf7a851686a12e2802e8e79b997c1a65326abc8a (patch)
tree77b760189e634651ed1d6a5cc08038fe781f9e97 /docs/manual/caching.xml
parentNo need for process.h system include since we don't use getpid() any more (diff)
downloadapache2-cf7a851686a12e2802e8e79b997c1a65326abc8a.tar.xz
apache2-cf7a851686a12e2802e8e79b997c1a65326abc8a.zip
Overhaul the caching guide in an effort to clearly distinguish between
the mod_cache caching, the socache caching, and other caching we do, such as mod_file_cache. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@1210725 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'docs/manual/caching.xml')
-rw-r--r--docs/manual/caching.xml1058
1 files changed, 636 insertions, 422 deletions
diff --git a/docs/manual/caching.xml b/docs/manual/caching.xml
index 20d5619f2a..511c672145 100644
--- a/docs/manual/caching.xml
+++ b/docs/manual/caching.xml
@@ -34,39 +34,50 @@
<section id="introduction">
<title>Introduction</title>
-
- <p>As of Apache HTTP server version 2.2 <module>mod_cache</module>
- and <module>mod_file_cache</module> are no longer marked
- experimental and are considered suitable for production use. These
- caching architectures provide a powerful means to accelerate HTTP
- handling, both as an origin webserver and as a proxy.</p>
-
- <p><module>mod_cache</module> and its provider modules
- <module>mod_cache_disk</module>
- provide intelligent, HTTP-aware caching. The content itself is stored
- in the cache, and mod_cache aims to honor all of the various HTTP
- headers and options that control the cachability of content. It can
- handle both local and proxied content. <module>mod_cache</module>
- is aimed at both simple and complex caching configurations, where
- you are dealing with proxied content, dynamic local content or
- have a need to speed up access to local files which change with
- time.</p>
-
- <p><module>mod_file_cache</module> on the other hand presents a more
- basic, but sometimes useful, form of caching. Rather than maintain
- the complexity of actively ensuring the cachability of URLs,
- <module>mod_file_cache</module> offers file-handle and memory-mapping
- tricks to keep a cache of files as they were when httpd was last
- started. As such, <module>mod_file_cache</module> is aimed at improving
- the access time to local static files which do not change very
- often.</p>
-
- <p>As <module>mod_file_cache</module> presents a relatively simple
- caching implementation, apart from the specific sections on <directive
- module="mod_file_cache">CacheFile</directive> and <directive
- module="mod_file_cache">MMapFile</directive>, the explanations
- in this guide cover the <module>mod_cache</module> caching
- architecture.</p>
+
+ <p>The Apache HTTP server offers a range of caching features that
+ are designed to improve the performance of the server in various
+ ways.</p>
+
+ <dl>
+ <dt>Three-state RFC2616 HTTP caching</dt>
+ <dd>
+ <module>mod_cache</module>
+ and its provider modules
+ <module>mod_cache_disk</module>
+ provide intelligent, HTTP-aware caching. The content itself is stored
+ in the cache, and mod_cache aims to honor all of the various HTTP
+ headers and options that control the cacheability of content
+ as described in
+ <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html">Section
+ 13 of RFC2616</a>.
+ <module>mod_cache</module>
+ is aimed at both simple and complex caching configurations, where
+ you are dealing with proxied content, dynamic local content or
+ have a need to speed up access to local files on a potentially
+ slow disk.
+ </dd>
+
+ <dt>Two-state key/value shared object caching</dt>
+ <dd>
+ <module>mod_socache</module>
+ and its provider modules provide a
+ server wide key/value based shared object cache. These modules
+ are designed to cache low level data such as SSL sessions and
+ authentication credentials. Backends allow the data to be stored
+ server wide in shared memory, or datacenter wide in a cache such
+ as memcache or distcache.
+ </dd>
+
+ <dt>Specialized file caching</dt>
+ <dd>
+ <module>mod_file_cache</module>
+ offers the ability to pre-load
+ files into memory on server startup, and can improve access
+ times and save file handles on files that are accessed often,
+ as there is no need to go to disk on each request.
+ </dd>
+ </dl>
<p>To get the most from this document, you should be familiar with
the basics of HTTP, and have read the Users' Guides to
@@ -75,102 +86,182 @@
</section>
- <section id="overview">
+ <section id="http-caching">
- <title>Caching Overview</title>
+ <title>Three-state RFC2616 HTTP caching</title>
<related>
<modulelist>
<module>mod_cache</module>
<module>mod_cache_disk</module>
- <module>mod_file_cache</module>
</modulelist>
<directivelist>
<directive module="mod_cache">CacheEnable</directive>
<directive module="mod_cache">CacheDisable</directive>
- <directive module="mod_file_cache">CacheFile</directive>
- <directive module="mod_file_cache">MMapFile</directive>
<directive module="core">UseCanonicalName</directive>
<directive module="mod_negotiation">CacheNegotiatedDocs</directive>
</directivelist>
</related>
- <p>There are two main stages in <module>mod_cache</module> that can
- occur in the lifetime of a request. First, <module>mod_cache</module>
- is a URL mapping module, which means that if a URL has been cached,
- and the cached version of that URL has not expired, the request will
- be served directly by <module>mod_cache</module>.</p>
-
- <p>This means that any other stages that might ordinarily happen
- in the process of serving a request -- for example being handled
- by <module>mod_proxy</module>, or <module>mod_rewrite</module> --
- won't happen. But then this is the point of caching content in
- the first place.</p>
-
- <p>If the URL is not found within the cache, <module>mod_cache</module>
- will add a <a href="filter.html">filter</a> to the request handling. After
- httpd has located the content by the usual means, the filter will be run
- as the content is served. If the content is determined to be cacheable,
- the content will be saved to the cache for future serving.</p>
-
- <p>If the URL is found within the cache, but also found to have expired,
- the filter is added anyway, but <module>mod_cache</module> will create
- a conditional request to the backend, to determine if the cached version
- is still current. If the cached version is still current, its
- meta-information will be updated and the request will be served from the
- cache. If the cached version is no longer current, the cached version
- will be deleted and the filter will save the updated content to the cache
- as it is served.</p>
+ <p>The HTTP protocol contains built in support for an in-line caching
+ mechanism
+ <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html">
+ described by section 13 of RFC2616</a>, and the
+ <module>mod_cache</module> module can be used to take advantage of
+ this.</p>
+
+ <p>Unlike a simple two state key/value cache where the content
+ disappears completely when no longer fresh, an HTTP cache includes
+ a mechanism to retain stale content, and to ask the origin server
+ whether this stale content has changed and if not, make it fresh
+ again.</p>
+
+ <p>An entry in an HTTP cache exists in one of three states:</p>
+
+ <dl>
+ <dt>Fresh</dt>
+ <dd>
+ If the content is new enough (younger than its <strong>freshness
+ lifetime</strong>), it is considered <strong>fresh</strong>. An
+ HTTP cache is free to serve fresh content without making any
+ calls to the origin server at all.
+ </dd>
+ <dt>Stale</dt>
+ <dd>
+ <p>If the content is too old (older than its <strong>freshness
+ lifetime</strong>), it is considered <strong>stale</strong>. An
+ HTTP cache should contact the origin server and check whether
+ the content is still fresh before serving stale content to a
+ client. The origin server will either respond with replacement
+ content if not still valid, or ideally, the origin server will
+ respond with a code to tell the cache the content is still
+ fresh, without the need to generate or send the content again.
+ The content becomes fresh again and the cycle continues.</p>
+
+ <p>The HTTP protocol does allow the cache to serve stale data
+ under certain circumstances, such as when an attempt to freshen
+ the data with an origin server has failed with a 5xx error, or
+ when another request is already in the process of freshening
+ the given entry. In these cases a <code>Warning</code> header
+ is added to the response.</p>
+ </dd>
+ <dt>Non Existent</dt>
+ <dd>
+ If the cache gets full, it reserves the option to delete content
+ from the cache to make space. Content can be deleted at any time,
+ and can be stale or fresh. The <a
+ href="programs/htcacheclean.html">htcacheclean</a> tool can be
+ run on a once off basis, or deployed as a daemon to keep the size
+ of the cache within the given size, or the given number of inodes.
+ The tool attempts to delete stale content before attempting to
+ delete fresh content.
+ </dd>
+ </dl>
+
+ <p>Full details of how HTTP caching works can be found in
+ <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html">
+ Section 13 of RFC2616</a>.</p>
+
+ <section>
+ <title>Interaction with the Server</title>
+
+ <p>The <module>mod_cache</module> module hooks into the server in two
+ possible places depending on the value of the
+ <directive module="mod_cache">CacheQuickHandler</directive> directive:
+ </p>
+
+ <dl>
+ <dt>Quick handler phase</dt>
+ <dd>
+ <p>This phase happens very early on during the request processing,
+ just after the request has been parsed. If the content is
+ found within the cache, it is served immediately and almost
+ all request processing is bypassed.</p>
+
+ <p>In this scenario, the cache behaves as if it has been "bolted
+ on" to the front of the server.</p>
+
+ <p>This mode offers the best performance, as the majority of
+ server processing is bypassed. This mode however also bypasses the
+ authentication and authorization phases of server processing, so
+ this mode should be chosen with care when this is important.</p>
+ </dd>
+ <dt>Normal handler phase</dt>
+ <dd>
+ <p>This phase happens late in the request processing, after all
+ the request phases have completed.</p>
+
+ <p>In this scenario, the cache behaves as if it has been "bolted
+ on" to the back of the server.</p>
+
+ <p>This mode offers the most flexibility, as the potential exists
+ for caching to occur at a precisely controlled point in the filter
+ chain, and cached content can be filtered or personalized before
+ being sent to the client.</p>
+ </dd>
+ </dl>
+
+ <p>If the URL is not found within the cache, <module>mod_cache</module>
+ will add a <a href="filter.html">filter</a> to the filter stack in order
+ to record the response to the cache, and then stand down, allowing normal
+ request processing to continue. If the content is determined to be
+ cacheable, the content will be saved to the cache for future serving,
+ otherwise the content will be ignored.</p>
+
+ <p>If the content found within the cache is stale, the
+ <module>mod_cache</module> module converts the request into a
+ <strong>conditional request</strong>. If the origin server responds with
+ a normal response, the normal response is cached, replacing the content
+ already cached. If the origin server responds with a 304 Not Modified
+ response, the content is marked as fresh again, and the cached content
+ is served by the filter instead of saving it.</p>
+ </section>
<section>
<title>Improving Cache Hits</title>
- <p>When caching locally generated content, ensuring that
- <directive module="core">UseCanonicalName</directive> is set to
- <code>On</code> can dramatically improve the ratio of cache hits. This
- is because the hostname of the virtual-host serving the content forms
- a part of the cache key. With the setting set to <code>On</code>
+ <p>When a virtual host is known by one of many different server aliases,
+ ensuring that <directive module="core">UseCanonicalName</directive> is
+ set to <code>On</code> can dramatically improve the ratio of cache hits.
+ This is because the hostname of the virtual-host serving the content is
+ used within the cache key. With the setting set to <code>On</code>
virtual-hosts with multiple server names or aliases will not produce
differently cached entities, and instead content will be cached as
per the canonical hostname.</p>
- <p>Because caching is performed within the URL to filename translation
- phase, cached documents will only be served in response to URL requests.
- Ordinarily this is of little consequence, but there is one circumstance
- in which it matters: If you are using <a href="howto/ssi.html">Server
- Side Includes</a>;</p>
-
- <example>
-&lt;!-- The following include can be cached --&gt;<br />
-&lt;!--#include virtual="/footer.html" --&gt; <br />
-<br />
-&lt;!-- The following include can not be cached --&gt;<br />
-&lt;!--#include file="/path/to/footer.html" --&gt;
- </example>
-
- <p>If you are using Server Side Includes, and want the benefit of speedy
- serves from the cache, you should use <code>virtual</code> include
- types.</p>
</section>
<section>
- <title>Expiry Periods</title>
-
- <p>The default expiry period for cached entities is one hour, however
+ <title>Freshness Lifetime</title>
+
+ <p>Well formed content that is intended to be cached should declare an
+ explicit freshness lifetime with the <code>Cache-Control</code>
+ header's <code>max-age</code> or <code>s-maxage</code> fields, or
+ by including an <code>Expires</code> header.</p>
+
+ <p>At the same time, the origin server defined freshness lifetime can
+ be overridden by a client when the client presents their own
+ <code>Cache-Control</code> header within the request. In this case,
+ the lowest freshness lifetime between request and response wins.</p>
+
+ <p>When this freshness lifetime is missing from the request or the
+ response, a default freshness lifetime is applied. The default
+ freshness lifetime for cached entities is one hour, however
this can be easily over-ridden by using the <directive
- module="mod_cache">CacheDefaultExpire</directive> directive. This
- default is only used when the original source of the content does not
- specify an expire time or time of last modification.</p>
+ module="mod_cache">CacheDefaultExpire</directive> directive.</p>
<p>If a response does not include an <code>Expires</code> header but does
include a <code>Last-Modified</code> header, <module>mod_cache</module>
- can infer an expiry period based on the use of the <directive
+ can infer a freshness lifetime based on a heuristic, which can be
+ controlled through the use of the <directive
module="mod_cache">CacheLastModifiedFactor</directive> directive.</p>
- <p>For local content, <module>mod_expires</module> may be used to
- fine-tune the expiry period.</p>
+ <p>For local content, or for remote content that does not define its own
+ <code>Expires</code> header, <module>mod_expires</module> may be used to
+ fine-tune the freshness lifetime by adding <code>max-age</code> and
+ <code>Expires</code>.</p>
- <p>The maximum expiry period may also be controlled by using the
+ <p>The maximum freshness lifetime may also be controlled by using the
<directive module="mod_cache">CacheMaxExpire</directive>.</p>
</section>
@@ -178,58 +269,60 @@
<section>
<title>A Brief Guide to Conditional Requests</title>
- <p>When content expires from the cache and is re-requested from the
- backend or content provider, rather than pass on the original request,
- httpd will use a conditional request instead.</p>
-
- <p>HTTP offers a number of headers which allow a client, or cache
- to discern between different versions of the same content. For
- example if a resource was served with an "Etag:" header, it is
- possible to make a conditional request with an "If-None-Match:"
- header. If a resource was served with a "Last-Modified:" header
- it is possible to make a conditional request with an
- "If-Modified-Since:" header, and so on.</p>
-
- <p>When such a conditional request is made, the response differs
- depending on whether the content matches the conditions. If a request is
- made with an "If-Modified-Since:" header, and the content has not been
- modified since the time indicated in the request then a terse "304 Not
- Modified" response is issued.</p>
-
- <p>If the content has changed, then it is served as if the request were
- not conditional to begin with.</p>
-
- <p>The benefits of conditional requests in relation to caching are
- twofold. Firstly, when making such a request to the backend, if the
- content from the backend matches the content in the store, this can be
- determined easily and without the overhead of transferring the entire
- resource.</p>
-
- <p>Secondly, conditional requests are usually less strenuous on the
- backend. For static files, typically all that is involved is a call
- to <code>stat()</code> or similar system call, to see if the file has
- changed in size or modification time. As such, even if httpd is
- caching local content, even expired content may still be served faster
- from the cache if it has not changed. As long as reading from the cache
- store is faster than reading from the backend (e.g. <module
- >mod_cache_disk</module> with memory disk
- compared to reading from disk).</p>
+ <p>When content expires from the cache and becomes stale, rather than
+ pass on the original request, httpd will modify the request to make
+ it conditional instead.</p>
+
+ <p>When an <code>ETag</code> header exists in the original cached
+ response, <module>mod_cache</module> will add an
+ <code>If-None-Match</code> header to the request to the origin server.
+ When a <code>Last-Modified</code> header exists in the original
+ cached response, <module>mod_cache</module> will add an
+ <code>If-Modified-Since</code> header to the request to the origin
+ server. Performing either of these actions makes the request
+ <strong>conditional</strong>.</p>
+
+ <p>When a conditional request is received by an origin server, the
+ origin server should check whether the ETag or the Last-Modified
+ parameter has changed, as appropriate for the request. If not, the
+ origin should respond with a terse "304 Not Modified" response. This
+ signals to the cache that the stale content is still fresh should be
+ used for subsequent requests until the content's new freshness lifetime
+ is reached again.</p>
+
+ <p>If the content has changed, then the content is served as if the
+ request were not conditional to begin with.</p>
+
+ <p>Conditional requests offer two benefits. Firstly, when making such
+ a request to the origin server, if the content from the origin
+ matches the content in the cache, this can be determined easily and
+ without the overhead of transferring the entire resource.</p>
+
+ <p>Secondly, a well designed origin server will be designed in such
+ a way that conditional requests will be significantly cheaper to
+ produce than a full response. For static files, typically all that is
+ involved is a call to <code>stat()</code> or similar system call, to
+ see if the file has changed in size or modification time. As such, even
+ local content may still be served faster from the cache if it has not
+ changed.</p>
+
+ <p>Origin servers should make every effort to support conditional
+ requests as is practical, however if conditional requests are not
+ supported, the origin will respond as if the request was not
+ conditional, and the cache will respond as if the content had changed
+ and save the new content to the cache. In this case, the cache will
+ behave like a simple two state cache, where content is effectively
+ either fresh or deleted.</p>
</section>
<section>
<title>What Can be Cached?</title>
- <p>As mentioned already, the two styles of caching in httpd work
- differently, <module>mod_file_cache</module> caching maintains file
- contents as they were when httpd was started. When a request is
- made for a file that is cached by this module, it is intercepted
- and the cached file is served.</p>
-
- <p><module>mod_cache</module> caching on the other hand is more
- complex. When serving a request, if it has not been cached
- previously, the caching module will determine if the content
- is cacheable. The conditions for determining cachability of
- a response are;</p>
+ <p>The full definition of which responses can be cached by an HTTP
+ cache is defined in
+ <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.4">
+ RFC2616 Section 13.4 Response Cacheability</a>, and can be summed up as
+ follows:</p>
<ol>
<li>Caching must be enabled for this URL. See the <directive
@@ -241,12 +334,9 @@
<li>The request must be a HTTP GET request.</li>
- <li>If the request contains an "Authorization:" header, the response
- will not be cached.</li>
-
<li>If the response contains an "Authorization:" header, it must
also contain an "s-maxage", "must-revalidate" or "public" option
- in the "Cache-Control:" header.</li>
+ in the "Cache-Control:" header, or it won't be cached.</li>
<li>If the URL included a query string (e.g. from a HTML form GET
method) it will not be cached unless the response specifies an
@@ -279,28 +369,41 @@
<section>
<title>What Should Not be Cached?</title>
- <p>In short, any content which is highly time-sensitive, or which varies
- depending on the particulars of the request that are not covered by
- HTTP negotiation, should not be cached.</p>
+ <p>It should be up to the client creating the request, or the origin
+ server constructing the response to decide whether or not the content
+ should be cacheable or not by correctly setting the
+ <code>Cache-Control</code> header, and <module>mod_cache</module> should
+ be left alone to honor the wishes of the client or server as appropriate.
+ </p>
- <p>If you have dynamic content which changes depending on the IP address
- of the requester, or changes every 5 minutes, it should almost certainly
- not be cached.</p>
+ <p>Content that is time sensitive, or which varies depending on the
+ particulars of the request that are not covered by HTTP negotiation,
+ should not be cached. This content should declare itself uncacheable
+ using the <code>Cache-Control</code> header.</p>
+
+ <p>If content changes often, expressed by a freshness lifetime of minutes
+ or seconds, the content can still be cached, however it is highly
+ desirable that the origin server supports
+ <strong>conditional requests</strong> correctly to ensure that
+ full responses do not have to be generated on a regular basis.</p>
+
+ <p>Content that varies based on client provided request headers can be
+ cached through intelligent use of the <code>Vary</code> response
+ header.</p>
- <p>If on the other hand, the content served differs depending on the
- values of various HTTP headers, it might be possible
- to cache it intelligently through the use of a "Vary" header.</p>
</section>
<section>
<title>Variable/Negotiated Content</title>
- <p>If a response with a "Vary" header is received by
- <module>mod_cache</module> when requesting content by the backend it
- will attempt to handle it intelligently. If possible,
- <module>mod_cache</module> will detect the headers attributed in the
- "Vary" response in future requests and serve the correct cached
- response.</p>
+ <p>When the origin server is designed to respond with different content
+ based on the value of headers in the request, for example to serve
+ multiple languages at the same URL, HTTP's caching mechanism makes it
+ possible to cache multiple variants of the same page at the same URL.</p>
+
+ <p>This is done by the origin server adding a <code>Vary</code> header
+ to indicate which headers must be taken into account by a cache when
+ determining whether two variants are different from one another.</p>
<p>If for example, a response is received with a vary header such as;</p>
@@ -311,270 +414,36 @@ Vary: negotiate,accept-language,accept-charset
<p><module>mod_cache</module> will only serve the cached content to
requesters with accept-language and accept-charset headers
matching those of the original request.</p>
+
+ <p>Multiple variants of the content can be cached side by side,
+ <module>mod_cache</module> uses the <code>Vary</code> header and the
+ corresponding values of the request headers listed by <code>Vary</code>
+ to decide on which of many variants to return to the client.</p>
</section>
- </section>
-
- <section id="security">
- <title>Security Considerations</title>
-
- <section>
- <title>Authorization and Access Control</title>
-
- <p>Using <module>mod_cache</module> is very much like having a built
- in reverse-proxy. Requests will be served by the caching module unless
- it determines that the backend should be queried. When caching local
- resources, this drastically changes the security model of httpd.</p>
-
- <p>As traversing a filesystem hierarchy to examine potential
- <code>.htaccess</code> files would be a very expensive operation,
- partially defeating the point of caching (to speed up requests),
- <module>mod_cache</module> makes no decision about whether a cached
- entity is authorised for serving. In other words; if
- <module>mod_cache</module> has cached some content, it will be served
- from the cache as long as that content has not expired.</p>
-
- <p>If, for example, your configuration permits access to a resource by IP
- address you should ensure that this content is not cached. You can do this
- by using the <directive module="mod_cache">CacheDisable</directive>
- directive, or <module>mod_expires</module>. Left unchecked,
- <module>mod_cache</module> - very much like a reverse proxy - would cache
- the content when served and then serve it to any client, on any IP
- address.</p>
- </section>
-
- <section>
- <title>Local exploits</title>
-
- <p>As requests to end-users can be served from the cache, the cache
- itself can become a target for those wishing to deface or interfere with
- content. It is important to bear in mind that the cache must at all
- times be writable by the user which httpd is running as. This is in
- stark contrast to the usually recommended situation of maintaining
- all content unwritable by the Apache user.</p>
-
- <p>If the Apache user is compromised, for example through a flaw in
- a CGI process, it is possible that the cache may be targeted. When
- using <module>mod_cache_disk</module>, it is relatively easy to
- insert or modify a cached entity.</p>
-
- <p>This presents a somewhat elevated risk in comparison to the other
- types of attack it is possible to make as the Apache user. If you are
- using <module>mod_cache_disk</module> you should bear this in mind -
- ensure you upgrade httpd when security upgrades are announced and
- run CGI processes as a non-Apache user using <a
- href="suexec.html">suEXEC</a> if possible.</p>
-
- </section>
-
- <section>
- <title>Cache Poisoning</title>
-
- <p>When running httpd as a caching proxy server, there is also the
- potential for so-called cache poisoning. Cache Poisoning is a broad
- term for attacks in which an attacker causes the proxy server to
- retrieve incorrect (and usually undesirable) content from the backend.
- </p>
-
- <p>For example if the DNS servers used by your system running
- httpd
- are vulnerable to DNS cache poisoning, an attacker may be able to control
- where httpd connects to when requesting content from the origin server.
- Another example is so-called HTTP request-smuggling attacks.</p>
-
- <p>This document is not the correct place for an in-depth discussion
- of HTTP request smuggling (instead, try your favourite search engine)
- however it is important to be aware that it is possible to make
- a series of requests, and to exploit a vulnerability on an origin
- webserver such that the attacker can entirely control the content
- retrieved by the proxy.</p>
- </section>
- </section>
- <section id="filehandle">
- <title>File-Handle Caching</title>
+ <section id="disk">
+ <title>Caching to Disk</title>
- <related>
- <modulelist>
- <module>mod_file_cache</module>
- </modulelist>
- <directivelist>
- <directive module="mod_file_cache">CacheFile</directive>
- </directivelist>
- </related>
+ <p>The <module>mod_cache</module> module relies on specific backend store
+ implementations in order to manage the cache, and for caching to disk
+ <module>mod_cache_disk</module> is provided to support this.</p>
- <p>The act of opening a file can itself be a source of delay, particularly
- on network filesystems. By maintaining a cache of open file descriptors
- for commonly served files, httpd can avoid this delay. Currently
- httpd
- provides one implementation of File-Handle Caching.</p>
-
- <section>
- <title>CacheFile</title>
-
- <p>The most basic form of caching present in httpd is the file-handle
- caching provided by <module>mod_file_cache</module>. Rather than caching
- file-contents, this cache maintains a table of open file descriptors. Files
- to be cached in this manner are specified in the configuration file using
- the <directive module="mod_file_cache">CacheFile</directive>
- directive.</p>
-
- <p>The
- <directive module="mod_file_cache">CacheFile</directive> directive
- instructs httpd to open the file when it is started and to re-use
- this file-handle for all subsequent access to this file.</p>
+ <p>Typically the module will be configured as so;</p>
<example>
- CacheFile /usr/local/apache2/htdocs/index.html
- </example>
-
- <p>If you intend to cache a large number of files in this manner, you
- must ensure that your operating system's limit for the number of open
- files is set appropriately.</p>
-
- <p>Although using <directive module="mod_file_cache">CacheFile</directive>
- does not cause the file-contents to be cached per-se, it does mean
- that if the file changes while httpd is running these changes will
- not be picked up. The file will be consistently served as it was
- when httpd was started.</p>
-
- <p>If the file is removed while httpd is running, it will continue
- to maintain an open file descriptor and serve the file as it was when
- httpd was started. This usually also means that although the file
- will have been deleted, and not show up on the filesystem, extra free
- space will not be recovered until httpd is stopped and the file
- descriptor closed.</p>
- </section>
-
- </section>
-
- <section id="inmemory">
- <title>In-Memory Caching</title>
-
- <related>
- <modulelist>
- <module>mod_file_cache</module>
- </modulelist>
- <directivelist>
- <directive module="mod_cache">CacheEnable</directive>
- <directive module="mod_cache">CacheDisable</directive>
- <directive module="mod_file_cache">MMapFile</directive>
- </directivelist>
- </related>
-
- <p>Serving directly from system memory is universally the fastest method
- of serving content. Reading files from a disk controller or, even worse,
- from a remote network is orders of magnitude slower. Disk controllers
- usually involve physical processes, and network access is limited by
- your available bandwidth. Memory access on the other hand can take mere
- nano-seconds.</p>
-
- <p>System memory isn't cheap though, byte for byte it's by far the most
- expensive type of storage and it's important to ensure that it is used
- efficiently. By caching files in memory you decrease the amount of
- memory available on the system. As we'll see, in the case of operating
- system caching, this is not so much of an issue, but when using
- httpd's own in-memory caching it is important to make sure that you
- do not allocate too much memory to a cache. Otherwise the system
- will be forced to swap out memory, which will likely degrade
- performance.</p>
-
- <section>
- <title>Operating System Caching</title>
-
- <p>Almost all modern operating systems cache file-data in memory managed
- directly by the kernel. This is a powerful feature, and for the most
- part operating systems get it right. For example, on Linux, let's look at
- the difference in the time it takes to read a file for the first time
- and the second time;</p>
-
- <example><pre>
-colm@coroebus:~$ time cat testfile &gt; /dev/null
-real 0m0.065s
-user 0m0.000s
-sys 0m0.001s
-colm@coroebus:~$ time cat testfile &gt; /dev/null
-real 0m0.003s
-user 0m0.003s
-sys 0m0.000s</pre>
- </example>
-
- <p>Even for this small file, there is a huge difference in the amount
- of time it takes to read the file. This is because the kernel has cached
- the file contents in memory.</p>
-
- <p>By ensuring there is "spare" memory on your system, you can ensure
- that more and more file-contents will be stored in this cache. This
- can be a very efficient means of in-memory caching, and involves no
- extra configuration of httpd at all.</p>
-
- <p>Additionally, because the operating system knows when files are
- deleted or modified, it can automatically remove file contents from the
- cache when necessary. This is a big advantage over httpd's in-memory
- caching which has no way of knowing when a file has changed.</p>
- </section>
-
- <p>Despite the performance and advantages of automatic operating system
- caching there are some circumstances in which in-memory caching may be
- better performed by httpd.</p>
-
- <section>
- <title>MMapFile Caching</title>
-
- <p><module>mod_file_cache</module> provides the
- <directive module="mod_file_cache">MMapFile</directive> directive, which
- allows you to have httpd map a static file's contents into memory at
- start time (using the mmap system call). httpd will use the in-memory
- contents for all subsequent accesses to this file.</p>
-
- <example>
- MMapFile /usr/local/apache2/htdocs/index.html
- </example>
-
- <p>As with the
- <directive module="mod_file_cache">CacheFile</directive> directive, any
- changes in these files will not be picked up by httpd after it has
- started.</p>
-
- <p> The <directive module="mod_file_cache">MMapFile</directive>
- directive does not keep track of how much memory it allocates, so
- you must ensure not to over-use the directive. Each httpd child
- process will replicate this memory, so it is critically important
- to ensure that the files mapped are not so large as to cause the
- system to swap memory.</p>
- </section>
- </section>
-
- <section id="disk">
- <title>Disk-based Caching</title>
-
- <related>
- <modulelist>
- <module>mod_cache_disk</module>
- </modulelist>
- <directivelist>
- <directive module="mod_cache">CacheEnable</directive>
- <directive module="mod_cache">CacheDisable</directive>
- </directivelist>
- </related>
-
- <p><module>mod_cache_disk</module> provides a disk-based caching mechanism
- for <module>mod_cache</module>. This cache is intelligent and content will
- be served from the cache only as long as it is considered valid.</p>
-
- <p>Typically the module will be configured as so;</p>
-
- <example>
CacheRoot /var/cache/apache/<br />
CacheEnable disk /<br />
CacheDirLevels 2<br />
CacheDirLength 1
- </example>
+ </example>
- <p>Importantly, as the cached files are locally stored, operating system
- in-memory caching will typically be applied to their access also. So
- although the files are stored on disk, if they are frequently accessed
- it is likely the operating system will ensure that they are actually
- served from memory.</p>
+ <p>Importantly, as the cached files are locally stored, operating system
+ in-memory caching will typically be applied to their access also. So
+ although the files are stored on disk, if they are frequently accessed
+ it is likely the operating system will ensure that they are actually
+ served from memory.</p>
+
+ </section>
<section>
<title>Understanding the Cache-Store</title>
@@ -582,7 +451,8 @@ CacheDirLength 1
<p>To store items in the cache, <module>mod_cache_disk</module> creates
a 22 character hash of the URL being requested. This hash incorporates
the hostname, protocol, port, path and any CGI arguments to the URL,
- to ensure that multiple URLs do not collide.</p>
+ as well as elements defined by the Vary header to ensure that multiple
+ URLs do not collide with one another.</p>
<p>Each character may be any one of 64-different characters, which mean
that overall there are 64^22 possible hashes. For example, a URL might
@@ -634,14 +504,14 @@ CacheDirLength 1
<section>
<title>Maintaining the Disk Cache</title>
- <p>Although <module>mod_cache_disk</module> will remove cached content
- as it is expired, it does not maintain any information on the total
- size of the cache or how little free space may be left.</p>
+ <p>The <module>mod_cache_disk</module> module makes no attempt to
+ regulate the amount of disk space used by the cache, although it
+ will gracefully stand down on any disk error and behave as if the
+ cache was never present.</p>
<p>Instead, provided with httpd is the <a
- href="programs/htcacheclean.html">htcacheclean</a> tool which, as the name
- suggests, allows you to clean the cache periodically. Determining
- how frequently to run <a
+ href="programs/htcacheclean.html">htcacheclean</a> tool which allows you
+ to clean the cache periodically. Determining how frequently to run <a
href="programs/htcacheclean.html">htcacheclean</a> and what target size to
use for the cache is somewhat complex and trial and error may be needed to
select optimal values.</p>
@@ -653,6 +523,10 @@ CacheDirLength 1
or more to process very large (tens of gigabytes) caches and if you are
running it from cron it is recommended that you determine how long a typical
run takes, to avoid running more than one instance at a time.</p>
+
+ <p>It is also recommended that an appropriate "nice" level is chosen for
+ htcacheclean so that the tool does not cause excessive disk io while the
+ server is running.</p>
<p class="figure">
<img src="images/caching_fig1.gif" alt="" width="600"
@@ -668,4 +542,344 @@ CacheDirLength 1
</section>
+ <section id="socache-caching">
+
+ <title>Two-state Key/Value Shared Object Caching</title>
+
+ <related>
+ <modulelist>
+ <module>mod_authn_socache</module>
+ <module>mod_socache_dbm</module>
+ <module>mod_socache_dc</module>
+ <module>mod_socache_memcache</module>
+ <module>mod_socache_shmcb</module>
+ <module>mod_ssl</module>
+ </modulelist>
+ <directivelist>
+ <directive module="mod_authn_socache">AuthnCacheSOCache</directive>
+ <directive module="mod_ssl">SSLSessionCache</directive>
+ <directive module="mod_ssl">SSLStaplingCache</directive>
+ </directivelist>
+ </related>
+
+ <p>The Apache HTTP server offers a low level shared object cache for
+ caching information such as SSL sessions, or authentication credentials,
+ within the <a href="socache.html">socache</a> interface.</p>
+
+ <p>Additional modules are provided for each implementation, offering the
+ following backends:</p>
+
+ <dl>
+ <dt><module>mod_socache_dbm</module></dt>
+ <dd>DBM based shared object cache.</dd>
+ <dt><module>mod_socache_dc</module></dt>
+ <dd>Distcache based shared object cache.</dd>
+ <dt><module>mod_socache_memcache</module></dt>
+ <dd>Memcache based shared object cache.</dd>
+ <dt><module>mod_socache_shmcb</module></dt>
+ <dd>Shared memory based shared object cache.</dd>
+ </dl>
+
+ <section id="mod_authn_socache-caching">
+ <title>Caching Authentication Credentials</title>
+
+ <related>
+ <modulelist>
+ <module>mod_authn_socache</module>
+ </modulelist>
+ <directivelist>
+ <directive module="mod_authn_socache">AuthnCacheSOCache</directive>
+ </directivelist>
+ </related>
+
+ <p>The <module>mod_authn_socache</module> module allows the result of
+ authentication to be cached, relieving load on authentication backends.</p>
+
+ </section>
+
+ <section id="mod_ssl-caching">
+ <title>Caching SSL Sessions</title>
+
+ <related>
+ <modulelist>
+ <module>mod_ssl</module>
+ </modulelist>
+ <directivelist>
+ <directive module="mod_ssl">SSLSessionCache</directive>
+ <directive module="mod_ssl">SSLStaplingCache</directive>
+ </directivelist>
+ </related>
+
+ <p>The <module>mod_ssl</module> module uses the <code>socache</code> interface
+ to provide a session cache and a stapling cache.</p>
+
+ </section>
+
+ </section>
+
+ <section id="file-caching">
+
+ <title>Specialized File Caching</title>
+
+ <related>
+ <modulelist>
+ <module>mod_file_cache</module>
+ </modulelist>
+ <directivelist>
+ <directive module="mod_file_cache">CacheFile</directive>
+ <directive module="mod_file_cache">MMapFile</directive>
+ </directivelist>
+ </related>
+
+ <p>On platforms where a filesystem might be slow, or where file
+ handles are expensive, the option exists to pre-load files into
+ memory on startup.</p>
+
+ <p>On systems where opening files is slow, the option exists to
+ open the file on startup and cache the file handle. These
+ options can help on systems where access to static files is
+ slow.</p>
+
+ <section id="filehandle">
+ <title>File-Handle Caching</title>
+
+ <p>The act of opening a file can itself be a source of delay, particularly
+ on network filesystems. By maintaining a cache of open file descriptors
+ for commonly served files, httpd can avoid this delay. Currently httpd
+ provides one implementation of File-Handle Caching.</p>
+
+ <section>
+ <title>CacheFile</title>
+
+ <p>The most basic form of caching present in httpd is the file-handle
+ caching provided by <module>mod_file_cache</module>. Rather than caching
+ file-contents, this cache maintains a table of open file descriptors. Files
+ to be cached in this manner are specified in the configuration file using
+ the <directive module="mod_file_cache">CacheFile</directive>
+ directive.</p>
+
+ <p>The
+ <directive module="mod_file_cache">CacheFile</directive> directive
+ instructs httpd to open the file when it is started and to re-use
+ this file-handle for all subsequent access to this file.</p>
+
+ <example>
+ CacheFile /usr/local/apache2/htdocs/index.html
+ </example>
+
+ <p>If you intend to cache a large number of files in this manner, you
+ must ensure that your operating system's limit for the number of open
+ files is set appropriately.</p>
+
+ <p>Although using <directive module="mod_file_cache">CacheFile</directive>
+ does not cause the file-contents to be cached per-se, it does mean
+ that if the file changes while httpd is running these changes will
+ not be picked up. The file will be consistently served as it was
+ when httpd was started.</p>
+
+ <p>If the file is removed while httpd is running, it will continue
+ to maintain an open file descriptor and serve the file as it was when
+ httpd was started. This usually also means that although the file
+ will have been deleted, and not show up on the filesystem, extra free
+ space will not be recovered until httpd is stopped and the file
+ descriptor closed.</p>
+ </section>
+
+ </section>
+
+ <section id="inmemory">
+ <title>In-Memory Caching</title>
+
+ <p>Serving directly from system memory is universally the fastest method
+ of serving content. Reading files from a disk controller or, even worse,
+ from a remote network is orders of magnitude slower. Disk controllers
+ usually involve physical processes, and network access is limited by
+ your available bandwidth. Memory access on the other hand can take mere
+ nano-seconds.</p>
+
+ <p>System memory isn't cheap though, byte for byte it's by far the most
+ expensive type of storage and it's important to ensure that it is used
+ efficiently. By caching files in memory you decrease the amount of
+ memory available on the system. As we'll see, in the case of operating
+ system caching, this is not so much of an issue, but when using
+ httpd's own in-memory caching it is important to make sure that you
+ do not allocate too much memory to a cache. Otherwise the system
+ will be forced to swap out memory, which will likely degrade
+ performance.</p>
+
+ <section>
+ <title>Operating System Caching</title>
+
+ <p>Almost all modern operating systems cache file-data in memory managed
+ directly by the kernel. This is a powerful feature, and for the most
+ part operating systems get it right. For example, on Linux, let's look at
+ the difference in the time it takes to read a file for the first time
+ and the second time;</p>
+
+ <example><pre>
+colm@coroebus:~$ time cat testfile &gt; /dev/null
+real 0m0.065s
+user 0m0.000s
+sys 0m0.001s
+colm@coroebus:~$ time cat testfile &gt; /dev/null
+real 0m0.003s
+user 0m0.003s
+sys 0m0.000s</pre>
+ </example>
+
+ <p>Even for this small file, there is a huge difference in the amount
+ of time it takes to read the file. This is because the kernel has cached
+ the file contents in memory.</p>
+
+ <p>By ensuring there is "spare" memory on your system, you can ensure
+ that more and more file-contents will be stored in this cache. This
+ can be a very efficient means of in-memory caching, and involves no
+ extra configuration of httpd at all.</p>
+
+ <p>Additionally, because the operating system knows when files are
+ deleted or modified, it can automatically remove file contents from the
+ cache when necessary. This is a big advantage over httpd's in-memory
+ caching which has no way of knowing when a file has changed.</p>
+ </section>
+
+ <p>Despite the performance and advantages of automatic operating system
+ caching there are some circumstances in which in-memory caching may be
+ better performed by httpd.</p>
+
+ <section>
+ <title>MMapFile Caching</title>
+
+ <p><module>mod_file_cache</module> provides the
+ <directive module="mod_file_cache">MMapFile</directive> directive, which
+ allows you to have httpd map a static file's contents into memory at
+ start time (using the mmap system call). httpd will use the in-memory
+ contents for all subsequent accesses to this file.</p>
+
+ <example>
+ MMapFile /usr/local/apache2/htdocs/index.html
+ </example>
+
+ <p>As with the
+ <directive module="mod_file_cache">CacheFile</directive> directive, any
+ changes in these files will not be picked up by httpd after it has
+ started.</p>
+
+ <p> The <directive module="mod_file_cache">MMapFile</directive>
+ directive does not keep track of how much memory it allocates, so
+ you must ensure not to over-use the directive. Each httpd child
+ process will replicate this memory, so it is critically important
+ to ensure that the files mapped are not so large as to cause the
+ system to swap memory.</p>
+ </section>
+ </section>
+
+ </section>
+
+ <section id="security">
+ <title>Security Considerations</title>
+
+ <section>
+ <title>Authorization and Access Control</title>
+
+ <p>Using <module>mod_cache</module> in its default state where
+ <directive module="mod_cache">CacheQuickHandler</directive> is set to
+ <code>On</code> is very much like having a caching reverse-proxy bolted
+ to the front of the server. Requests will be served by the caching module
+ unless it determines that the origin server should be queried just as an
+ external cache would, and this drastically changes the security model of
+ httpd.</p>
+
+ <p>As traversing a filesystem hierarchy to examine potential
+ <code>.htaccess</code> files would be a very expensive operation,
+ partially defeating the point of caching (to speed up requests),
+ <module>mod_cache</module> makes no decision about whether a cached
+ entity is authorised for serving. In other words; if
+ <module>mod_cache</module> has cached some content, it will be served
+ from the cache as long as that content has not expired.</p>
+
+ <p>If, for example, your configuration permits access to a resource by IP
+ address you should ensure that this content is not cached. You can do this
+ by using the <directive module="mod_cache">CacheDisable</directive>
+ directive, or <module>mod_expires</module>. Left unchecked,
+ <module>mod_cache</module> - very much like a reverse proxy - would cache
+ the content when served and then serve it to any client, on any IP
+ address.</p>
+
+ <p>When the <directive module="mod_cache">CacheQuickHandler</directive>
+ directive is set to <code>Off</code>, the full set of request processing
+ phases are executed and the security model remains unchanged.</p>
+ </section>
+
+ <section>
+ <title>Local exploits</title>
+
+ <p>As requests to end-users can be served from the cache, the cache
+ itself can become a target for those wishing to deface or interfere with
+ content. It is important to bear in mind that the cache must at all
+ times be writable by the user which httpd is running as. This is in
+ stark contrast to the usually recommended situation of maintaining
+ all content unwritable by the Apache user.</p>
+
+ <p>If the Apache user is compromised, for example through a flaw in
+ a CGI process, it is possible that the cache may be targeted. When
+ using <module>mod_cache_disk</module>, it is relatively easy to
+ insert or modify a cached entity.</p>
+
+ <p>This presents a somewhat elevated risk in comparison to the other
+ types of attack it is possible to make as the Apache user. If you are
+ using <module>mod_cache_disk</module> you should bear this in mind -
+ ensure you upgrade httpd when security upgrades are announced and
+ run CGI processes as a non-Apache user using <a
+ href="suexec.html">suEXEC</a> if possible.</p>
+
+ </section>
+
+ <section>
+ <title>Cache Poisoning</title>
+
+ <p>When running httpd as a caching proxy server, there is also the
+ potential for so-called cache poisoning. Cache Poisoning is a broad
+ term for attacks in which an attacker causes the proxy server to
+ retrieve incorrect (and usually undesirable) content from the origin
+ server.</p>
+
+ <p>For example if the DNS servers used by your system running httpd
+ are vulnerable to DNS cache poisoning, an attacker may be able to control
+ where httpd connects to when requesting content from the origin server.
+ Another example is so-called HTTP request-smuggling attacks.</p>
+
+ <p>This document is not the correct place for an in-depth discussion
+ of HTTP request smuggling (instead, try your favourite search engine)
+ however it is important to be aware that it is possible to make
+ a series of requests, and to exploit a vulnerability on an origin
+ webserver such that the attacker can entirely control the content
+ retrieved by the proxy.</p>
+ </section>
+
+ <section>
+ <title>Denial of Service / Cachebusting</title>
+
+ <p>The Vary mechanism allows multiple variants of the same URL to be
+ cached side by side. Depending on header values provided by the client,
+ the cache will select the correct variant to return to the client. This
+ mechanism can become a problem when an attempt is made to vary on a
+ header that is known to contain a wide range of possible values under
+ normal use, for example the <code>User-Agent</code> header. Depending
+ on the popularity of the particular web site thousands or millions of
+ duplicate cache entries could be created for the same URL, crowding
+ out other entries in the cache.</p>
+
+ <p>In other cases, there may be a need to change the URL of a particular
+ resource on every request, usually by adding a "cachebuster" string to
+ the URL. If this content is declared cacheable by a server for a
+ significant freshness lifetime, these entries can crowd out
+ legitimate entries in a cache. While <module>mod_cache</module>
+ provides a
+ <directive module="mod_cache">CacheIgnoreURLSessionIdentifiers</directive>
+ directive, this directive should be used with care to ensure that
+ downstream proxy or browser caches aren't subjected to the same denial
+ of service issue.</p>
+ </section>
+ </section>
+
</manualpage>