Apache HTTP Server Version 1.3
An In-Depth Discussion of VirtualHost
Matching
This is a very rough document that was probably out of date
the moment it was written. It attempts to explain exactly what
the code does when deciding what virtual host to serve a hit
from. It's provided on the assumption that something is better
than nothing. The server version under discussion is Apache
1.2.
If you just want to "make it work" without understanding
how, there's a What Works section at
the bottom.
Config File Parsing
There is a main_server which consists of all the definitions
appearing outside of VirtualHost sections. There
are virtual servers, called vhosts, which are defined
by VirtualHost
sections.
The directives Port, ServerName,
ServerPath,
and ServerAlias
can appear anywhere within the definition of a server. However,
each appearance overrides the previous appearance (within that
server).
The default value of the Port field for
main_server is 80. The main_server has no default
ServerName, ServerPath, or
ServerAlias.
In the absence of any Listen
directives, the (final if there are multiple) Port
directive in the main_server indicates which port httpd will
listen on.
The Port and ServerName directives
for any server main or virtual are used when generating URLs
such as during redirects.
Each address appearing in the VirtualHost
directive can have an optional port. If the port is unspecified
it defaults to the value of the main_server's most recent
Port statement. The special port *
indicates a wildcard that matches any port. Collectively the
entire set of addresses (including multiple A
record results from DNS lookups) are called the vhost's
address set.
The magic _default_ address has significance
during the matching algorithm. It essentially matches any
unspecified address.
After parsing the VirtualHost directive, the
vhost server is given a default Port equal to the
port assigned to the first name in its VirtualHost
directive. The complete list of names in the
VirtualHost directive are treated just like a
ServerAlias (but are not overridden by any
ServerAlias statement). Note that subsequent
Port statements for this vhost will not affect the
ports assigned in the address set.
All vhosts are stored in a list which is in the reverse
order that they appeared in the config file. For example, if
the config file is:
<VirtualHost A>
...
</VirtualHost>
<VirtualHost B>
...
</VirtualHost>
<VirtualHost C>
...
</VirtualHost>
Then the list will be ordered: main_server, C, B, A. Keep this
in mind.
After parsing has completed, the list of servers is scanned,
and various merges and default values are set. In
particular:
- If a vhost has no
ServerAdmin,
ResourceConfig,
AccessConfig,
Timeout,
KeepAliveTimeout,
KeepAlive,
MaxKeepAliveRequests,
or SendBufferSize
directive then the respective value is inherited from the
main_server. (That is, inherited from whatever the final
setting of that value is in the main_server.)
- The "lookup defaults" that define the default directory
permissions for a vhost are merged with those of the main
server. This includes any per-directory configuration
information for any module.
- The per-server configs for each module from the
main_server are merged into the vhost server.
Essentially, the main_server is treated as "defaults" or a
"base" on which to build each vhost. But the positioning of
these main_server definitions in the config file is largely
irrelevant -- the entire config of the main_server has been
parsed when this final merging occurs. So even if a main_server
definition appears after a vhost definition it might affect the
vhost definition.
If the main_server has no ServerName at this
point, then the hostname of the machine that httpd is running
on is used instead. We will call the main_server address
set those IP addresses returned by a DNS lookup on the
ServerName of the main_server.
Now a pass is made through the vhosts to fill in any missing
ServerName fields and to classify the vhost as
either an IP-based vhost or a name-based
vhost. A vhost is considered a name-based vhost if any of its
address set overlaps the main_server (the port associated with
each address must match the main_server's Port).
Otherwise it is considered an IP-based vhost.
For any undefined ServerName fields, a
name-based vhost defaults to the address given first in the
VirtualHost statement defining the vhost. Any
vhost that includes the magic _default_ wildcard
is given the same ServerName as the main_server.
Otherwise the vhost (which is necessarily an IP-based vhost) is
given a ServerName based on the result of a
reverse DNS lookup on the first address given in the
VirtualHost statement.
Vhost Matching
Apache 1.3 differs from what is documented here, and
documentation still has to be written.
The server determines which vhost to use for a request as
follows:
find_virtual_server: When the connection is
first made by the client, the local IP address (the IP address
to which the client connected) is looked up in the server list.
A vhost is matched if it is an IP-based vhost, the IP address
matches and the port matches (taking into account
wildcards).
If no vhosts are matched then the last occurrence, if it
appears, of a _default_ address (which if you
recall the ordering of the server list mentioned above means
that this would be the first occurrence of
_default_ in the config file) is matched.
In any event, if nothing above has matched, then the
main_server is matched.
The vhost resulting from the above search is stored with
data about the connection. We'll call this the connection
vhost. The connection vhost is constant over all requests
in a particular TCP/IP session -- that is, over all requests in
a KeepAlive/persistent session.
For each request made on the connection the following
sequence of events further determines the actual vhost that
will be used to serve the request.
check_fulluri: If the requestURI is an
absoluteURI, that is it includes http://hostname/,
then an attempt is made to determine if the hostname's address
(and optional port) match that of the connection vhost. If it
does then the hostname portion of the URI is saved as the
request_hostname. If it does not match, then the URI
remains untouched. Note: to achieve this
address comparison, the hostname supplied goes through a DNS
lookup unless it matches the ServerName or the
local IP address of the client's socket.
parse_uri: If the URI begins with a protocol
(i.e., http:, ftp:) then the
request is considered a proxy request. Note that even though we
may have stripped an http://hostname/ in the
previous step, this could still be a proxy request.
read_request: If the request does not have a
hostname from the earlier step, then any Host:
header sent by the client is used as the request hostname.
check_hostalias: If the request now has a
hostname, then an attempt is made to match for this hostname.
The first step of this match is to compare any port, if one was
given in the request, against the Port field of
the connection vhost. If there's a mismatch then the vhost used
for the request is the connection vhost. (This is a bug, see
observations.)
If the port matches, then httpd scans the list of vhosts
starting with the next server after the
connection vhost. This scan does not stop if there are any
matches, it goes through all possible vhosts, and in the end
uses the last match it found. The comparisons performed are as
follows:
- Compare the request hostname:port with the vhost
ServerName and Port.
- Compare the request hostname against any and all
addresses given in the
VirtualHost directive for
this vhost.
- Compare the request hostname against the
ServerAlias given for the vhost.
check_serverpath: If the request has no
hostname (back up a few paragraphs) then a scan similar to the
one in check_hostalias is performed to match any
ServerPath directives given in the vhosts. Note
that the last match is used regardless (again
consider the ordering of the virtual hosts).
Observations
- It is difficult to define an IP-based vhost for the
machine's "main IP address". You essentially have to create a
bogus
ServerName for the main_server that does
not match the machine's IPs.
-
During the scans in both
check_hostalias and
check_serverpath no check is made that the
vhost being scanned is actually a name-based vhost. This
means, for example, that it's possible to match an IP-based
vhost through another address. But because the scan starts
in the vhost list at the first vhost that matched the local
IP address of the connection, not all IP-based vhosts can
be matched.
Consider the config file above with three vhosts A, B,
C. Suppose that B is a named-based vhost, and A and C are
IP-based vhosts. If a request comes in on B or C's address
containing a header "Host: A" then it will be
served from A's config. If a request comes in on A's
address then it will always be served from A's config
regardless of any Host: header.
-
Unless you have a _default_ vhost, it doesn't
matter if you mix name-based vhosts in amongst IP-based
vhosts. During the
find_virtual_server phase
above no named-based vhost will be matched, so the
main_server will remain the connection vhost. Then scans
will cover all vhosts in the vhost list.
If you do have a _default_ vhost, then you
cannot place named-based vhosts after it in the config.
This is because on any connection to the main server IPs
the connection vhost will always be the
_default_ vhost since none of the name-based
are considered during find_virtual_server.
- You should never specify DNS names in
VirtualHost directives because it will force
your server to rely on DNS to boot. Furthermore it poses a
security threat if you do not control the DNS for all the
domains listed. There's more
information available on this and the next two
topics.
ServerName should always be set for each
vhost. Otherwise A DNS lookup is required for each
vhost.
- A DNS lookup is always required for the main_server's
ServerName (or to generate that if it isn't
specified in the config).
- If a
ServerPath directive exists which is a
prefix of another ServerPath directive that
appears later in the configuration file, then the former will
always be matched and the latter will never be matched. (That
is assuming that no Host header was available to disambiguate
the two.)
- If a vhost that would otherwise be a name-vhost includes
a
Port statement that doesn't match the
main_server Port then it will be considered an
IP-based vhost. Then find_virtual_server will
match it (because the ports associated with each address in
the address set default to the port of the main_server) as
the connection vhost. Then check_hostalias will
refuse to check any other name-based vhost because of the
port mismatch. The result is that the vhost will steal all
hits going to the main_server address.
- If two IP-based vhosts have an address in common, the
vhost appearing later in the file is always matched. Such a
thing might happen inadvertently. If the config has
name-based vhosts and for some reason the main_server
ServerName resolves to the wrong address then
all the name-based vhosts will be parsed as ip-based vhosts.
Then the last of them will steal all the hits.
- The last name-based vhost in the config is always matched
for any hit which doesn't match one of the other name-based
vhosts.
In addition to the tips on the DNS Issues page, here are some
further tips:
- Place all main_server definitions before any VirtualHost
definitions. (This is to aid the readability of the
configuration -- the post-config merging process makes it
non-obvious that definitions mixed in around virtualhosts
might affect all virtualhosts.)
- Arrange your VirtualHosts such that all name-based
virtual hosts come first, followed by IP-based virtual hosts,
followed by any _default_ virtual host
- Avoid
ServerPaths which are prefixes of
other ServerPaths. If you cannot avoid this then
you have to ensure that the longer (more specific) prefix
vhost appears earlier in the configuration file than the
shorter (less specific) prefix (i.e., "ServerPath
/abc" should appear after "ServerPath /abcdef").
- Do not use port-based vhosts in the same server
as name-based vhosts. A loose definition for port-based is a
vhost which is determined by the port on the server
(i.e., one server with ports 8000, 8080, and 80 -
all of which have different configurations).
Apache HTTP Server Version 1.3
|