diff options
author | Rich Bowen <rbowen@apache.org> | 2001-09-22 20:53:20 +0200 |
---|---|---|
committer | Rich Bowen <rbowen@apache.org> | 2001-09-22 20:53:20 +0200 |
commit | 1bf05b9838e25403ff49e68c7ce8e26af90b6bd5 (patch) | |
tree | 8d0f0997663688543686f0dea197117a28730949 /docs/manual/content-negotiation.html.en | |
parent | By popular demand, the beginnings of an explanation of how the request (diff) | |
download | apache2-1bf05b9838e25403ff49e68c7ce8e26af90b6bd5.tar.xz apache2-1bf05b9838e25403ff49e68c7ce8e26af90b6bd5.zip |
Ran w3c tidy on these as 'tidy -mi -asxml' to get xhtml. Please verify,
in particular, the non-english files, to make sure I did not screw
anything up. They look fine to me.
git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@91112 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'docs/manual/content-negotiation.html.en')
-rw-r--r-- | docs/manual/content-negotiation.html.en | 1195 |
1 files changed, 623 insertions, 572 deletions
diff --git a/docs/manual/content-negotiation.html.en b/docs/manual/content-negotiation.html.en index a813fcf8c9..b5af8bf892 100644 --- a/docs/manual/content-negotiation.html.en +++ b/docs/manual/content-negotiation.html.en @@ -1,132 +1,127 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> -<HTML> -<HEAD> -<TITLE>Apache Content Negotiation</TITLE> -</HEAD> - -<!-- Background white, links blue (unvisited), navy (visited), red (active) --> -<BODY - BGCOLOR="#FFFFFF" - TEXT="#000000" - LINK="#0000FF" - VLINK="#000080" - ALINK="#FF0000" -> -<!--#include virtual="header.html" --> -<H1 ALIGN="CENTER">Content Negotiation</H1> - -<P> -Apache's support for content negotiation has been updated to meet the -HTTP/1.1 specification. It can choose the best representation of a -resource based on the browser-supplied preferences for media type, -languages, character set and encoding. It is also implements a -couple of features to give more intelligent handling of requests from -browsers which send incomplete negotiation information. <P> - -Content negotiation is provided by the -<A HREF="mod/mod_negotiation.html">mod_negotiation</A> module, -which is compiled in by default. - -<HR> - -<H2>About Content Negotiation</H2> - -<P> -A resource may be available in several different representations. For -example, it might be available in different languages or different -media types, or a combination. One way of selecting the most -appropriate choice is to give the user an index page, and let them -select. However it is often possible for the server to choose -automatically. This works because browsers can send as part of each -request information about what representations they prefer. For -example, a browser could indicate that it would like to see -information in French, if possible, else English will do. Browsers -indicate their preferences by headers in the request. To request only -French representations, the browser would send - -<PRE> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> + +<html xmlns="http://www.w3.org/1999/xhtml"> + <head> + <meta name="generator" content="HTML Tidy, see www.w3.org" /> + + <title>Apache Content Negotiation</title> + </head> + <!-- Background white, links blue (unvisited), navy (visited), red (active) --> + + <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" + vlink="#000080" alink="#FF0000"> + <!--#include virtual="header.html" --> + + <h1 align="CENTER">Content Negotiation</h1> + + <p>Apache's support for content negotiation has been updated to + meet the HTTP/1.1 specification. It can choose the best + representation of a resource based on the browser-supplied + preferences for media type, languages, character set and + encoding. It is also implements a couple of features to give + more intelligent handling of requests from browsers which send + incomplete negotiation information.</p> + + <p>Content negotiation is provided by the <a + href="mod/mod_negotiation.html">mod_negotiation</a> module, + which is compiled in by default.</p> + <hr /> + + <h2>About Content Negotiation</h2> + + <p>A resource may be available in several different + representations. For example, it might be available in + different languages or different media types, or a combination. + One way of selecting the most appropriate choice is to give the + user an index page, and let them select. However it is often + possible for the server to choose automatically. This works + because browsers can send as part of each request information + about what representations they prefer. For example, a browser + could indicate that it would like to see information in French, + if possible, else English will do. Browsers indicate their + preferences by headers in the request. To request only French + representations, the browser would send</p> +<pre> Accept-Language: fr -</PRE> - -<P> -Note that this preference will only be applied when there is a choice -of representations and they vary by language. -<P> - -As an example of a more complex request, this browser has been -configured to accept French and English, but prefer French, and to -accept various media types, preferring HTML over plain text or other -text types, and preferring GIF or JPEG over other media types, but also -allowing any other media type as a last resort: - -<PRE> +</pre> + + <p>Note that this preference will only be applied when there is + a choice of representations and they vary by language.</p> + + <p>As an example of a more complex request, this browser has + been configured to accept French and English, but prefer + French, and to accept various media types, preferring HTML over + plain text or other text types, and preferring GIF or JPEG over + other media types, but also allowing any other media type as a + last resort:</p> +<pre> Accept-Language: fr; q=1.0, en; q=0.5 Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6, image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1 -</PRE> - -Apache 1.2 supports 'server driven' content negotiation, as defined in -the HTTP/1.1 specification. It fully supports the Accept, -Accept-Language, Accept-Charset and Accept-Encoding request headers. -Apache 1.3.4 also supports 'transparent' content negotiation, which is -an experimental negotiation protocol defined in RFC 2295 and RFC 2296. -It does not offer support for 'feature negotiation' as defined in -these RFCs. -<P> - -A <STRONG>resource</STRONG> is a conceptual entity identified by a URI -(RFC 2396). An HTTP server like Apache provides access to -<STRONG>representations</STRONG> of the resource(s) within its namespace, -with each representation in the form of a sequence of bytes with a -defined media type, character set, encoding, etc. Each resource may be -associated with zero, one, or more than one representation -at any given time. If multiple representations are available, -the resource is referred to as <STRONG>negotiable</STRONG> and each of its -representations is termed a <STRONG>variant</STRONG>. The ways in which the -variants for a negotiable resource vary are called the -<STRONG>dimensions</STRONG> of negotiation. - -<H2>Negotiation in Apache</H2> - -<P> -In order to negotiate a resource, the server needs to be given -information about each of the variants. This is done in one of two -ways: - -<UL> - <LI> Using a type map (<EM>i.e.</EM>, a <CODE>*.var</CODE> file) which - names the files containing the variants explicitly, or - <LI> Using a 'MultiViews' search, where the server does an implicit - filename pattern match and chooses from among the results. -</UL> - -<H3>Using a type-map file</H3> - -<P> -A type map is a document which is associated with the handler -named <CODE>type-map</CODE> (or, for backwards-compatibility with -older Apache configurations, the mime type -<CODE>application/x-type-map</CODE>). Note that to use this feature, -you must have a handler set in the configuration that defines a -file suffix as <CODE>type-map</CODE>; this is best done with a - -<PRE> +</pre> + Apache 1.2 supports 'server driven' content negotiation, as + defined in the HTTP/1.1 specification. It fully supports the + Accept, Accept-Language, Accept-Charset and Accept-Encoding + request headers. Apache 1.3.4 also supports 'transparent' + content negotiation, which is an experimental negotiation + protocol defined in RFC 2295 and RFC 2296. It does not offer + support for 'feature negotiation' as defined in these RFCs. + + <p>A <strong>resource</strong> is a conceptual entity + identified by a URI (RFC 2396). An HTTP server like Apache + provides access to <strong>representations</strong> of the + resource(s) within its namespace, with each representation in + the form of a sequence of bytes with a defined media type, + character set, encoding, etc. Each resource may be associated + with zero, one, or more than one representation at any given + time. If multiple representations are available, the resource + is referred to as <strong>negotiable</strong> and each of its + representations is termed a <strong>variant</strong>. The ways + in which the variants for a negotiable resource vary are called + the <strong>dimensions</strong> of negotiation.</p> + + <h2>Negotiation in Apache</h2> + + <p>In order to negotiate a resource, the server needs to be + given information about each of the variants. This is done in + one of two ways:</p> + + <ul> + <li>Using a type map (<em>i.e.</em>, a <code>*.var</code> + file) which names the files containing the variants + explicitly, or</li> + + <li>Using a 'MultiViews' search, where the server does an + implicit filename pattern match and chooses from among the + results.</li> + </ul> + + <h3>Using a type-map file</h3> + + <p>A type map is a document which is associated with the + handler named <code>type-map</code> (or, for + backwards-compatibility with older Apache configurations, the + mime type <code>application/x-type-map</code>). Note that to + use this feature, you must have a handler set in the + configuration that defines a file suffix as + <code>type-map</code>; this is best done with a</p> +<pre> AddHandler type-map .var -</PRE> - -in the server configuration file.<p> - -Type map files should have the same name as the resource which they are -describing, and have an entry for each available variant; these entries -consist of contiguous HTTP-format header lines. Entries for -different variants are separated by blank lines. Blank lines are -illegal within an entry. It is conventional to begin a map file with -an entry for the combined entity as a whole (although this -is not required, and if present will be ignored). An example -map file is shown below. This file would be named <code>foo.html</code>, -as it describes a resource named <code>foo</code>. - -<PRE> +</pre> + in the server configuration file. + + <p>Type map files should have the same name as the resource + which they are describing, and have an entry for each available + variant; these entries consist of contiguous HTTP-format header + lines. Entries for different variants are separated by blank + lines. Blank lines are illegal within an entry. It is + conventional to begin a map file with an entry for the combined + entity as a whole (although this is not required, and if + present will be ignored). An example map file is shown below. + This file would be named <code>foo.html</code>, as it describes + a resource named <code>foo</code>.</p> +<pre> URI: foo URI: foo.en.html @@ -136,16 +131,13 @@ as it describes a resource named <code>foo</code>. URI: foo.fr.de.html Content-type: text/html;charset=iso-8859-2 Content-language: fr, de -</PRE> - -Note also that a typemap file will take precedence over the filename's -extension, even when Multiviews is on. - -If the variants have different source qualities, that may be indicated -by the "qs" parameter to the media type, as in this picture (available -as jpeg, gif, or ASCII-art): - -<PRE> +</pre> + Note also that a typemap file will take precedence over the + filename's extension, even when Multiviews is on. If the + variants have different source qualities, that may be indicated + by the "qs" parameter to the media type, as in this picture + (available as jpeg, gif, or ASCII-art): +<pre> URI: foo URI: foo.jpeg @@ -156,445 +148,504 @@ as jpeg, gif, or ASCII-art): URI: foo.txt Content-type: text/plain; qs=0.01 -</PRE> -<P> - -qs values can vary in the range 0.000 to 1.000. Note that any variant with -a qs value of 0.000 will never be chosen. Variants with no 'qs' -parameter value are given a qs factor of 1.0. The qs parameter indicates -the relative 'quality' of this variant compared to the other available -variants, independent of the client's capabilities. For example, a jpeg -file is usually of higher source quality than an ascii file if it is -attempting to represent a photograph. However, if the resource being -represented is an original ascii art, then an ascii representation would -have a higher source quality than a jpeg representation. A qs value -is therefore specific to a given variant depending on the nature of -the resource it represents. - -<P> -The full list of headers recognized is: - -<DL> - <DT> <CODE>URI:</CODE> - <DD> uri of the file containing the variant (of the given media - type, encoded with the given content encoding). These are - interpreted as URLs relative to the map file; they must be on - the same server (!), and they must refer to files to which the - client would be granted access if they were to be requested - directly. - <DT> <CODE>Content-Type:</CODE> - <DD> media type --- charset, level and "qs" parameters may be given. These - are often referred to as MIME types; typical media types are - <CODE>image/gif</CODE>, <CODE>text/plain</CODE>, or - <CODE>text/html; level=3</CODE>. - <DT> <CODE>Content-Language:</CODE> - <DD> The languages of the variant, specified as an Internet standard - language tag from RFC 1766 (<EM>e.g.</EM>, <CODE>en</CODE> for English, - <CODE>kr</CODE> for Korean, <EM>etc.</EM>). - <DT> <CODE>Content-Encoding:</CODE> - <DD> If the file is compressed, or otherwise encoded, rather than - containing the actual raw data, this says how that was done. - Apache only recognizes encodings that are defined by an - <A HREF="mod/mod_mime.html#addencoding">AddEncoding</A> directive. - This normally includes the encodings <CODE>x-compress</CODE> - for compress'd files, and <CODE>x-gzip</CODE> for gzip'd files. - The <CODE>x-</CODE> prefix is ignored for encoding comparisons. - <DT> <CODE>Content-Length:</CODE> - <DD> The size of the file in bytes. Specifying content - lengths in the type-map allows the server to compare file sizes - without checking the actual files. - <DT> <CODE>Description:</CODE> - <DD> A human-readable textual description of the variant. If Apache cannot - find any appropriate variant to return, it will return an error - response which lists all available variants instead. Such a variant - list will include the human-readable variant descriptions. -</DL> - -Using a type map file is preferred over <code>MultiViews</code> because -it requires less CPU time, and less file access, to parse a file -explicitly listing the various resource variants, than to have to look -at every matching file, and parse its file extensions. - -<H3>Multiviews</H3> - -<P> -<CODE>MultiViews</CODE> is a per-directory option, meaning it can be set with -an <CODE>Options</CODE> directive within a <CODE><Directory></CODE>, -<CODE><Location></CODE> or <CODE><Files></CODE> -section in <CODE>access.conf</CODE>, or (if <CODE>AllowOverride</CODE> -is properly set) in <CODE>.htaccess</CODE> files. Note that -<CODE>Options All</CODE> does not set <CODE>MultiViews</CODE>; you -have to ask for it by name. - -<P> -The effect of <CODE>MultiViews</CODE> is as follows: if the server -receives a request for <CODE>/some/dir/foo</CODE>, if -<CODE>/some/dir</CODE> has <CODE>MultiViews</CODE> enabled, and -<CODE>/some/dir/foo</CODE> does <EM>not</EM> exist, then the server reads the -directory looking for files named foo.*, and effectively fakes up a -type map which names all those files, assigning them the same media -types and content-encodings it would have if the client had asked for -one of them by name. It then chooses the best match to the client's -requirements. - -<P> -<CODE>MultiViews</CODE> may also apply to searches for the file named by the -<CODE>DirectoryIndex</CODE> directive, if the server is trying to -index a directory. If the configuration files specify - -<PRE> +</pre> + + <p>qs values can vary in the range 0.000 to 1.000. Note that + any variant with a qs value of 0.000 will never be chosen. + Variants with no 'qs' parameter value are given a qs factor of + 1.0. The qs parameter indicates the relative 'quality' of this + variant compared to the other available variants, independent + of the client's capabilities. For example, a jpeg file is + usually of higher source quality than an ascii file if it is + attempting to represent a photograph. However, if the resource + being represented is an original ascii art, then an ascii + representation would have a higher source quality than a jpeg + representation. A qs value is therefore specific to a given + variant depending on the nature of the resource it + represents.</p> + + <p>The full list of headers recognized is:</p> + + <dl> + <dt><code>URI:</code></dt> + + <dd>uri of the file containing the variant (of the given + media type, encoded with the given content encoding). These + are interpreted as URLs relative to the map file; they must + be on the same server (!), and they must refer to files to + which the client would be granted access if they were to be + requested directly.</dd> + + <dt><code>Content-Type:</code></dt> + + <dd>media type --- charset, level and "qs" parameters may be + given. These are often referred to as MIME types; typical + media types are <code>image/gif</code>, + <code>text/plain</code>, or + <code>text/html; level=3</code>.</dd> + + <dt><code>Content-Language:</code></dt> + + <dd>The languages of the variant, specified as an Internet + standard language tag from RFC 1766 (<em>e.g.</em>, + <code>en</code> for English, <code>kr</code> for Korean, + <em>etc.</em>).</dd> + + <dt><code>Content-Encoding:</code></dt> + + <dd>If the file is compressed, or otherwise encoded, rather + than containing the actual raw data, this says how that was + done. Apache only recognizes encodings that are defined by an + <a href="mod/mod_mime.html#addencoding">AddEncoding</a> + directive. This normally includes the encodings + <code>x-compress</code> for compress'd files, and + <code>x-gzip</code> for gzip'd files. The <code>x-</code> + prefix is ignored for encoding comparisons.</dd> + + <dt><code>Content-Length:</code></dt> + + <dd>The size of the file in bytes. Specifying content lengths + in the type-map allows the server to compare file sizes + without checking the actual files.</dd> + + <dt><code>Description:</code></dt> + + <dd>A human-readable textual description of the variant. If + Apache cannot find any appropriate variant to return, it will + return an error response which lists all available variants + instead. Such a variant list will include the human-readable + variant descriptions.</dd> + </dl> + Using a type map file is preferred over <code>MultiViews</code> + because it requires less CPU time, and less file access, to + parse a file explicitly listing the various resource variants, + than to have to look at every matching file, and parse its file + extensions. + + <h3>Multiviews</h3> + + <p><code>MultiViews</code> is a per-directory option, meaning + it can be set with an <code>Options</code> directive within a + <code><Directory></code>, <code><Location></code> + or <code><Files></code> section in + <code>access.conf</code>, or (if <code>AllowOverride</code> is + properly set) in <code>.htaccess</code> files. Note that + <code>Options All</code> does not set <code>MultiViews</code>; + you have to ask for it by name.</p> + + <p>The effect of <code>MultiViews</code> is as follows: if the + server receives a request for <code>/some/dir/foo</code>, if + <code>/some/dir</code> has <code>MultiViews</code> enabled, and + <code>/some/dir/foo</code> does <em>not</em> exist, then the + server reads the directory looking for files named foo.*, and + effectively fakes up a type map which names all those files, + assigning them the same media types and content-encodings it + would have if the client had asked for one of them by name. It + then chooses the best match to the client's requirements.</p> + + <p><code>MultiViews</code> may also apply to searches for the + file named by the <code>DirectoryIndex</code> directive, if the + server is trying to index a directory. If the configuration + files specify</p> +<pre> DirectoryIndex index -</PRE> - -then the server will arbitrate between <CODE>index.html</CODE> -and <CODE>index.html3</CODE> if both are present. If neither are -present, and <CODE>index.cgi</CODE> is there, the server will run it. - -<P> -If one of the files found when reading the directive is a CGI script, -it's not obvious what should happen. The code gives that case -special treatment --- if the request was a POST, or a GET with -QUERY_ARGS or PATH_INFO, the script is given an extremely high quality -rating, and generally invoked; otherwise it is given an extremely low -quality rating, which generally causes one of the other views (if any) -to be retrieved. - -<H2>The Negotiation Methods</H2> - -After Apache has obtained a list of the variants for a given resource, -either from a type-map file or from the filenames in the directory, it -invokes one of two methods to decide on the 'best' variant to -return, if any. It is not necessary to know any of the details of how -negotiation actually takes place in order to use Apache's content -negotiation features. However the rest of this document explains the -methods used for those interested. -<P> - -There are two negotiation methods: - -<OL> - -<LI><STRONG>Server driven negotiation with the Apache -algorithm</STRONG> is used in the normal case. The Apache algorithm is -explained in more detail below. When this algorithm is used, Apache -can sometimes 'fiddle' the quality factor of a particular dimension to -achieve a better result. The ways Apache can fiddle quality factors is -explained in more detail below. - -<LI><STRONG>Transparent content negotiation</STRONG> is used when the -browser specifically requests this through the mechanism defined in RFC -2295. This negotiation method gives the browser full control over -deciding on the 'best' variant, the result is therefore dependent on -the specific algorithms used by the browser. As part of the -transparent negotiation process, the browser can ask Apache to run the -'remote variant selection algorithm' defined in RFC 2296. - -</OL> - - -<H3>Dimensions of Negotiation</H3> - -<TABLE> -<TR valign="top"> -<TH>Dimension -<TH>Notes -<TR valign="top"> -<TD>Media Type -<TD>Browser indicates preferences with the Accept header field. Each item -can have an associated quality factor. Variant description can also -have a quality factor (the "qs" parameter). -<TR valign="top"> -<TD>Language -<TD>Browser indicates preferences with the Accept-Language header field. -Each item can have a quality factor. Variants can be associated with none, one -or more than one language. -<TR valign="top"> -<TD>Encoding -<TD>Browser indicates preference with the Accept-Encoding header field. -Each item can have a quality factor. -<TR valign="top"> -<TD>Charset -<TD>Browser indicates preference with the Accept-Charset header field. -Each item can have a quality factor. -Variants can indicate a charset as a parameter of the media type. -</TABLE> - -<H3>Apache Negotiation Algorithm</H3> - -<P> -Apache can use the following algorithm to select the 'best' variant -(if any) to return to the browser. This algorithm is not -further configurable. It operates as follows: - -<OL> -<LI>First, for each dimension of the negotiation, check the appropriate -<EM>Accept*</EM> header field and assign a quality to each -variant. If the <EM>Accept*</EM> header for any dimension implies that this -variant is not acceptable, eliminate it. If no variants remain, go -to step 4. - -<LI>Select the 'best' variant by a process of elimination. Each of the -following tests is applied in order. Any variants not selected at each -test are eliminated. After each test, if only one variant remains, -select it as the best match and proceed to step 3. If more than one -variant remains, move on to the next test. - -<OL> -<LI>Multiply the quality factor from the Accept header with the - quality-of-source factor for this variant's media type, and select - the variants with the highest value. - -<LI>Select the variants with the highest language quality factor. - -<LI>Select the variants with the best language match, using either the - order of languages in the Accept-Language header (if present), or else - the order of languages in the <CODE>LanguagePriority</CODE> - directive (if present). - -<LI>Select the variants with the highest 'level' media parameter - (used to give the version of text/html media types). - -<LI>Select variants with the best charset media parameters, - as given on the Accept-Charset header line. Charset ISO-8859-1 - is acceptable unless explicitly excluded. Variants with a - <CODE>text/*</CODE> media type but not explicitly associated - with a particular charset are assumed to be in ISO-8859-1. - -<LI>Select those variants which have associated - charset media parameters that are <EM>not</EM> ISO-8859-1. - If there are no such variants, select all variants instead. - -<LI>Select the variants with the best encoding. If there are - variants with an encoding that is acceptable to the user-agent, - select only these variants. Otherwise if there is a mix of encoded - and non-encoded variants, select only the unencoded variants. - If either all variants are encoded or all variants are not encoded, - select all variants. - -<LI>Select the variants with the smallest content length. - -<LI>Select the first variant of those remaining. This will be either the - first listed in the type-map file, or when variants are read from - the directory, the one whose file name comes first when sorted using - ASCII code order. - -</OL> - -<LI>The algorithm has now selected one 'best' variant, so return - it as the response. The HTTP response header Vary is set to indicate the - dimensions of negotiation (browsers and caches can use this - information when caching the resource). End. - -<LI>To get here means no variant was selected (because none are acceptable - to the browser). Return a 406 status (meaning "No acceptable representation") - with a response body consisting of an HTML document listing the - available variants. Also set the HTTP Vary header to indicate the - dimensions of variance. - -</OL> - -<H2><A NAME="better">Fiddling with Quality Values</A></H2> - -<P> -Apache sometimes changes the quality values from what would be -expected by a strict interpretation of the Apache negotiation -algorithm above. This is to get a better result from the algorithm for -browsers which do not send full or accurate information. Some of the -most popular browsers send Accept header information which would -otherwise result in the selection of the wrong variant in many -cases. If a browser sends full and correct information these fiddles -will not be applied. -<P> - -<H3>Media Types and Wildcards</H3> - -<P> -The Accept: request header indicates preferences for media types. It -can also include 'wildcard' media types, such as "image/*" or "*/*" -where the * matches any string. So a request including: -<PRE> +</pre> + then the server will arbitrate between <code>index.html</code> + and <code>index.html3</code> if both are present. If neither + are present, and <code>index.cgi</code> is there, the server + will run it. + + <p>If one of the files found when reading the directive is a + CGI script, it's not obvious what should happen. The code gives + that case special treatment --- if the request was a POST, or a + GET with QUERY_ARGS or PATH_INFO, the script is given an + extremely high quality rating, and generally invoked; otherwise + it is given an extremely low quality rating, which generally + causes one of the other views (if any) to be retrieved.</p> + + <h2>The Negotiation Methods</h2> + After Apache has obtained a list of the variants for a given + resource, either from a type-map file or from the filenames in + the directory, it invokes one of two methods to decide on the + 'best' variant to return, if any. It is not necessary to know + any of the details of how negotiation actually takes place in + order to use Apache's content negotiation features. However the + rest of this document explains the methods used for those + interested. + + <p>There are two negotiation methods:</p> + + <ol> + <li><strong>Server driven negotiation with the Apache + algorithm</strong> is used in the normal case. The Apache + algorithm is explained in more detail below. When this + algorithm is used, Apache can sometimes 'fiddle' the quality + factor of a particular dimension to achieve a better result. + The ways Apache can fiddle quality factors is explained in + more detail below.</li> + + <li><strong>Transparent content negotiation</strong> is used + when the browser specifically requests this through the + mechanism defined in RFC 2295. This negotiation method gives + the browser full control over deciding on the 'best' variant, + the result is therefore dependent on the specific algorithms + used by the browser. As part of the transparent negotiation + process, the browser can ask Apache to run the 'remote + variant selection algorithm' defined in RFC 2296.</li> + </ol> + + <h3>Dimensions of Negotiation</h3> + + <table> + <tr valign="top"> + <th>Dimension</th> + + <th>Notes</th> + </tr> + + <tr valign="top"> + <td>Media Type</td> + + <td>Browser indicates preferences with the Accept header + field. Each item can have an associated quality factor. + Variant description can also have a quality factor (the + "qs" parameter).</td> + </tr> + + <tr valign="top"> + <td>Language</td> + + <td>Browser indicates preferences with the Accept-Language + header field. Each item can have a quality factor. Variants + can be associated with none, one or more than one + language.</td> + </tr> + + <tr valign="top"> + <td>Encoding</td> + + <td>Browser indicates preference with the Accept-Encoding + header field. Each item can have a quality factor.</td> + </tr> + + <tr valign="top"> + <td>Charset</td> + + <td>Browser indicates preference with the Accept-Charset + header field. Each item can have a quality factor. Variants + can indicate a charset as a parameter of the media + type.</td> + </tr> + </table> + + <h3>Apache Negotiation Algorithm</h3> + + <p>Apache can use the following algorithm to select the 'best' + variant (if any) to return to the browser. This algorithm is + not further configurable. It operates as follows:</p> + + <ol> + <li>First, for each dimension of the negotiation, check the + appropriate <em>Accept*</em> header field and assign a + quality to each variant. If the <em>Accept*</em> header for + any dimension implies that this variant is not acceptable, + eliminate it. If no variants remain, go to step 4.</li> + + <li> + Select the 'best' variant by a process of elimination. Each + of the following tests is applied in order. Any variants + not selected at each test are eliminated. After each test, + if only one variant remains, select it as the best match + and proceed to step 3. If more than one variant remains, + move on to the next test. + + <ol> + <li>Multiply the quality factor from the Accept header + with the quality-of-source factor for this variant's + media type, and select the variants with the highest + value.</li> + + <li>Select the variants with the highest language quality + factor.</li> + + <li>Select the variants with the best language match, + using either the order of languages in the + Accept-Language header (if present), or else the order of + languages in the <code>LanguagePriority</code> directive + (if present).</li> + + <li>Select the variants with the highest 'level' media + parameter (used to give the version of text/html media + types).</li> + + <li>Select variants with the best charset media + parameters, as given on the Accept-Charset header line. + Charset ISO-8859-1 is acceptable unless explicitly + excluded. Variants with a <code>text/*</code> media type + but not explicitly associated with a particular charset + are assumed to be in ISO-8859-1.</li> + + <li>Select those variants which have associated charset + media parameters that are <em>not</em> ISO-8859-1. If + there are no such variants, select all variants + instead.</li> + + <li>Select the variants with the best encoding. If there + are variants with an encoding that is acceptable to the + user-agent, select only these variants. Otherwise if + there is a mix of encoded and non-encoded variants, + select only the unencoded variants. If either all + variants are encoded or all variants are not encoded, + select all variants.</li> + + <li>Select the variants with the smallest content + length.</li> + + <li>Select the first variant of those remaining. This + will be either the first listed in the type-map file, or + when variants are read from the directory, the one whose + file name comes first when sorted using ASCII code + order.</li> + </ol> + </li> + + <li>The algorithm has now selected one 'best' variant, so + return it as the response. The HTTP response header Vary is + set to indicate the dimensions of negotiation (browsers and + caches can use this information when caching the resource). + End.</li> + + <li>To get here means no variant was selected (because none + are acceptable to the browser). Return a 406 status (meaning + "No acceptable representation") with a response body + consisting of an HTML document listing the available + variants. Also set the HTTP Vary header to indicate the + dimensions of variance.</li> + </ol> + + <h2><a id="better" name="better">Fiddling with Quality + Values</a></h2> + + <p>Apache sometimes changes the quality values from what would + be expected by a strict interpretation of the Apache + negotiation algorithm above. This is to get a better result + from the algorithm for browsers which do not send full or + accurate information. Some of the most popular browsers send + Accept header information which would otherwise result in the + selection of the wrong variant in many cases. If a browser + sends full and correct information these fiddles will not be + applied.</p> + + <h3>Media Types and Wildcards</h3> + + <p>The Accept: request header indicates preferences for media + types. It can also include 'wildcard' media types, such as + "image/*" or "*/*" where the * matches any string. So a request + including:</p> +<pre> Accept: image/*, */* -</PRE> - -would indicate that any type starting "image/" is acceptable, -as is any other type (so the first "image/*" is redundant). Some -browsers routinely send wildcards in addition to explicit types they -can handle. For example: -<PRE> +</pre> + would indicate that any type starting "image/" is acceptable, + as is any other type (so the first "image/*" is redundant). + Some browsers routinely send wildcards in addition to explicit + types they can handle. For example: +<pre> Accept: text/html, text/plain, image/gif, image/jpeg, */* -</PRE> - -The intention of this is to indicate that the explicitly -listed types are preferred, but if a different representation is -available, that is ok too. However under the basic algorithm, as given -above, the */* wildcard has exactly equal preference to all the other -types, so they are not being preferred. The browser should really have -sent a request with a lower quality (preference) value for *.*, such -as: -<PRE> +</pre> + The intention of this is to indicate that the explicitly listed + types are preferred, but if a different representation is + available, that is ok too. However under the basic algorithm, + as given above, the */* wildcard has exactly equal preference + to all the other types, so they are not being preferred. The + browser should really have sent a request with a lower quality + (preference) value for *.*, such as: +<pre> Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01 -</PRE> - -The explicit types have no quality factor, so they default to a -preference of 1.0 (the highest). The wildcard */* is given -a low preference of 0.01, so other types will only be returned if -no variant matches an explicitly listed type. -<P> - -If the Accept: header contains <EM>no</EM> q factors at all, Apache sets -the q value of "*/*", if present, to 0.01 to emulate the desired -behavior. It also sets the q value of wildcards of the format -"type/*" to 0.02 (so these are preferred over matches against -"*/*". If any media type on the Accept: header contains a q factor, -these special values are <EM>not</EM> applied, so requests from browsers -which send the correct information to start with work as expected. - -<H3>Variants with no Language</H3> - -<P> -If some of the variants for a particular resource have a language -attribute, and some do not, those variants with no language -are given a very low language quality factor of 0.001.<P> - -The reason for setting this language quality factor for -variant with no language to a very low value is to allow -for a default variant which can be supplied if none of the -other variants match the browser's language preferences. - -For example, consider the situation with three variants: - -<UL> -<LI>foo.en.html, language en -<LI>foo.fr.html, language en -<LI>foo.html, no language -</UL> - -<P> -The meaning of a variant with no language is that it is -always acceptable to the browser. If the request Accept-Language -header includes either en or fr (or both) one of foo.en.html -or foo.fr.html will be returned. If the browser does not list -either en or fr as acceptable, foo.html will be returned instead. - -<H2>Extensions to Transparent Content Negotiation</H2> - -Apache extends the transparent content negotiation protocol (RFC 2295) -as follows. A new <CODE> {encoding ..}</CODE> element is used in -variant lists to label variants which are available with a specific -content-encoding only. The implementation of the -RVSA/1.0 algorithm (RFC 2296) is extended to recognize encoded -variants in the list, and to use them as candidate variants whenever -their encodings are acceptable according to the Accept-Encoding -request header. The RVSA/1.0 implementation does not round computed -quality factors to 5 decimal places before choosing the best variant. - -<H2>Note on hyperlinks and naming conventions</H2> - -<P> -If you are using language negotiation you can choose between -different naming conventions, because files can have more than one -extension, and the order of the extensions is normally irrelevant -(see the <A HREF="mod/mod_mime.html#multipleext">mod_mime</A> -documentation for details). -<P> -A typical file has a MIME-type extension (<EM>e.g.</EM>, <SAMP>html</SAMP>), -maybe an encoding extension (<EM>e.g.</EM>, <SAMP>gz</SAMP>), and of course a -language extension (<EM>e.g.</EM>, <SAMP>en</SAMP>) when we have different -language variants of this file. - -<P> -Examples: -<UL> -<LI>foo.en.html -<LI>foo.html.en -<LI>foo.en.html.gz -</UL> - -<P> -Here some more examples of filenames together with valid and invalid -hyperlinks: -</P> - -<TABLE BORDER=1 CELLPADDING=8 CELLSPACING=0> -<TR> - <TH>Filename</TH> - <TH>Valid hyperlink</TH> - <TH>Invalid hyperlink</TH> -</TR> -<TR> - <TD><EM>foo.html.en</EM></TD> - <TD>foo<BR> - foo.html</TD> - <TD>-</TD> -</TR> -<TR> - <TD><EM>foo.en.html</EM></TD> - <TD>foo</TD> - <TD>foo.html</TD> -</TR> -<TR> - <TD><EM>foo.html.en.gz</EM></TD> - <TD>foo<BR> - foo.html</TD> - <TD>foo.gz<BR> - foo.html.gz</TD> -</TR> -<TR> - <TD><EM>foo.en.html.gz</EM></TD> - <TD>foo</TD> - <TD>foo.html<BR> - foo.html.gz<BR> - foo.gz</TD> -</TR> -<TR> - <TD><EM>foo.gz.html.en</EM></TD> - <TD>foo<BR> - foo.gz<BR> - foo.gz.html</TD> - <TD>foo.html</TD> -</TR> -<TR> - <TD><EM>foo.html.gz.en</EM></TD> - <TD>foo<BR> - foo.html<BR> - foo.html.gz</TD> - <TD>foo.gz</TD> -</TR> -</TABLE> - -<P> -Looking at the table above you will notice that it is always possible to -use the name without any extensions in an hyperlink (<EM>e.g.</EM>, <SAMP>foo</SAMP>). -The advantage is that you can hide the actual type of a -document rsp. file and can change it later, <EM>e.g.</EM>, from <SAMP>html</SAMP> -to <SAMP>shtml</SAMP> or <SAMP>cgi</SAMP> without changing any -hyperlink references. - -<P> -If you want to continue to use a MIME-type in your hyperlinks (<EM>e.g.</EM> -<SAMP>foo.html</SAMP>) the language extension (including an encoding extension -if there is one) must be on the right hand side of the MIME-type extension -(<EM>e.g.</EM>, <SAMP>foo.html.en</SAMP>). - - -<H2>Note on Caching</H2> - -<P> -When a cache stores a representation, it associates it with the request URL. -The next time that URL is requested, the cache can use the stored -representation. But, if the resource is negotiable at the server, -this might result in only the first requested variant being cached and -subsequent cache hits might return the wrong response. To prevent this, -Apache normally marks all responses that are returned after content negotiation -as non-cacheable by HTTP/1.0 clients. Apache also supports the HTTP/1.1 -protocol features to allow caching of negotiated responses. <P> - -For requests which come from a HTTP/1.0 compliant client (either a -browser or a cache), the directive <TT>CacheNegotiatedDocs</TT> can be -used to allow caching of responses which were subject to negotiation. -This directive can be given in the server config or virtual host, and -takes no arguments. It has no effect on requests from HTTP/1.1 clients. - -<!--#include virtual="footer.html" --> -</BODY> -</HTML> +</pre> + The explicit types have no quality factor, so they default to a + preference of 1.0 (the highest). The wildcard */* is given a + low preference of 0.01, so other types will only be returned if + no variant matches an explicitly listed type. + + <p>If the Accept: header contains <em>no</em> q factors at all, + Apache sets the q value of "*/*", if present, to 0.01 to + emulate the desired behavior. It also sets the q value of + wildcards of the format "type/*" to 0.02 (so these are + preferred over matches against "*/*". If any media type on the + Accept: header contains a q factor, these special values are + <em>not</em> applied, so requests from browsers which send the + correct information to start with work as expected.</p> + + <h3>Variants with no Language</h3> + + <p>If some of the variants for a particular resource have a + language attribute, and some do not, those variants with no + language are given a very low language quality factor of + 0.001.</p> + + <p>The reason for setting this language quality factor for + variant with no language to a very low value is to allow for a + default variant which can be supplied if none of the other + variants match the browser's language preferences. For example, + consider the situation with three variants:</p> + + <ul> + <li>foo.en.html, language en</li> + + <li>foo.fr.html, language en</li> + + <li>foo.html, no language</li> + </ul> + + <p>The meaning of a variant with no language is that it is + always acceptable to the browser. If the request + Accept-Language header includes either en or fr (or both) one + of foo.en.html or foo.fr.html will be returned. If the browser + does not list either en or fr as acceptable, foo.html will be + returned instead.</p> + + <h2>Extensions to Transparent Content Negotiation</h2> + Apache extends the transparent content negotiation protocol + (RFC 2295) as follows. A new <code>{encoding ..}</code> element + is used in variant lists to label variants which are available + with a specific content-encoding only. The implementation of + the RVSA/1.0 algorithm (RFC 2296) is extended to recognize + encoded variants in the list, and to use them as candidate + variants whenever their encodings are acceptable according to + the Accept-Encoding request header. The RVSA/1.0 implementation + does not round computed quality factors to 5 decimal places + before choosing the best variant. + + <h2>Note on hyperlinks and naming conventions</h2> + + <p>If you are using language negotiation you can choose between + different naming conventions, because files can have more than + one extension, and the order of the extensions is normally + irrelevant (see the <a + href="mod/mod_mime.html#multipleext">mod_mime</a> documentation + for details).</p> + + <p>A typical file has a MIME-type extension (<em>e.g.</em>, + <samp>html</samp>), maybe an encoding extension (<em>e.g.</em>, + <samp>gz</samp>), and of course a language extension + (<em>e.g.</em>, <samp>en</samp>) when we have different + language variants of this file.</p> + + <p>Examples:</p> + + <ul> + <li>foo.en.html</li> + + <li>foo.html.en</li> + + <li>foo.en.html.gz</li> + </ul> + + <p>Here some more examples of filenames together with valid and + invalid hyperlinks:</p> + + <table border="1" cellpadding="8" cellspacing="0"> + <tr> + <th>Filename</th> + + <th>Valid hyperlink</th> + + <th>Invalid hyperlink</th> + </tr> + + <tr> + <td><em>foo.html.en</em></td> + + <td>foo<br /> + foo.html</td> + + <td>-</td> + </tr> + + <tr> + <td><em>foo.en.html</em></td> + + <td>foo</td> + + <td>foo.html</td> + </tr> + + <tr> + <td><em>foo.html.en.gz</em></td> + + <td>foo<br /> + foo.html</td> + + <td>foo.gz<br /> + foo.html.gz</td> + </tr> + + <tr> + <td><em>foo.en.html.gz</em></td> + + <td>foo</td> + + <td>foo.html<br /> + foo.html.gz<br /> + foo.gz</td> + </tr> + + <tr> + <td><em>foo.gz.html.en</em></td> + + <td>foo<br /> + foo.gz<br /> + foo.gz.html</td> + + <td>foo.html</td> + </tr> + + <tr> + <td><em>foo.html.gz.en</em></td> + + <td>foo<br /> + foo.html<br /> + foo.html.gz</td> + + <td>foo.gz</td> + </tr> + </table> + + <p>Looking at the table above you will notice that it is always + possible to use the name without any extensions in an hyperlink + (<em>e.g.</em>, <samp>foo</samp>). The advantage is that you + can hide the actual type of a document rsp. file and can change + it later, <em>e.g.</em>, from <samp>html</samp> to + <samp>shtml</samp> or <samp>cgi</samp> without changing any + hyperlink references.</p> + + <p>If you want to continue to use a MIME-type in your + hyperlinks (<em>e.g.</em> <samp>foo.html</samp>) the language + extension (including an encoding extension if there is one) + must be on the right hand side of the MIME-type extension + (<em>e.g.</em>, <samp>foo.html.en</samp>).</p> + + <h2>Note on Caching</h2> + + <p>When a cache stores a representation, it associates it with + the request URL. The next time that URL is requested, the cache + can use the stored representation. But, if the resource is + negotiable at the server, this might result in only the first + requested variant being cached and subsequent cache hits might + return the wrong response. To prevent this, Apache normally + marks all responses that are returned after content negotiation + as non-cacheable by HTTP/1.0 clients. Apache also supports the + HTTP/1.1 protocol features to allow caching of negotiated + responses.</p> + + <p>For requests which come from a HTTP/1.0 compliant client + (either a browser or a cache), the directive + <tt>CacheNegotiatedDocs</tt> can be used to allow caching of + responses which were subject to negotiation. This directive can + be given in the server config or virtual host, and takes no + arguments. It has no effect on requests from HTTP/1.1 clients. + <!--#include virtual="footer.html" --> + </p> + </body> +</html> + |