In this chapter we look at some of the rules that are needed to make a hub function. Until now we have focused on the client form of the configuration file. Since the role of the client is narrow (to forward all mail to the hub), its configuration file is simple. But a hub can be a very busy machine, receiving and sending mail for many client machines, and because its role is broad, its configuration file is complex.
Fundamentally, all configuration files, simple and complex, tend to look pretty much the same. Both begin by selecting delivery agents using rule set 3 and 0. Both then process recipient or sender addresses with rule sets 3, 1 or 2, R= or S=, then 4, but the hub's rules are more complex:
The hub needs to recognize more than simple Internet-style addresses. It may need to handle UUCP-style addresses or reverse-style addresses such as those used in parts of the United Kingdom. It needs rules to convert all such addresses into a form that it can understand.
The hub needs not only to forward mail (like the client), but also to deliver it to the mail spool directory, to pipe through programs, and to form mailing lists.
The hub needs to handle all error conditions gracefully and to emit helpful and clear error messages.
The hub needs to know how to connect to many different kinds of machines worldwide.
In this chapter we explore high points of the V8 configuration files. Along the way, we also mix in rules contributed by others to help illustrate difficult concepts.
Recall that all addresses are first processed by rule set 3. Its job is to find an address among other clutter and to normalize all addresses into a form that other rules can recognize.
Recall that addresses can legally assume two forms:
address (comment) comment <address>
In the first form, sendmail strips (and saves) the parenthesized comment, then gives the naked address to rule set 3. In the second form, sendmail passes the entire address, angle brackets and all, to rule set 3.
The rules to strip the angle brackets look like this: [1]
[1] These ingenious rules were designed by LeRoy Eide, with surrounding commentary inspired by John Halleck.
S3 R$* $: <$1> Guarantee at least one <> pair R$+ <$*> <$2> Remove everything before the last < R<$*> $+ $: <$1> Remove everything after the first > R<> $@ <@> Null address to @ R<$*> $: $1 Strip remaining <>
In the following, we discuss each of these rules individually.
To find the address in addresses of the form
comment <address>
we must use rules to search for the <
and >
characters. Designing rules that do this is easier if we can
be sure that every address has at least one surrounding
angle bracket pair:
R$* $: <$1> Guarantee at least one <> pair
This rule places angle brackets around all addresses, even those
that already have them. Note that the $:
that prefixes the
RHS causes it to be executed only once.
A side benefit of this rule is that it also surrounds an empty (null) address
with angle brackets. This allows old versions of sendmail to
detect null addresses without needing to use the new (beginning with V8.7
sendmail) $@
LHS operator. We'll cover this in more detail soon.
A common problem is that of finding the address when it is deeply nested in many pairs of angle brackets. Consider an address like this:
<<<<address>>>>
Such addresses are not common but do appear every now and then as a result of overzealous users or MUAs. Another problem address looks like this:
comment <phone> <address>
Here, just noting the outermost pair of angle brackets is not sufficient because the rightmost pair contains the address.
The process of finding the rightmost innermost pair of angle brackets requires two rules:
R$+ <$*> <$2> Remove everything before the last < R<$*> $+ $: <$1> Remove everything after the first >
The first recursively discards everything (including angle brackets)
to the left of the rightmost balanced <
character.
The second truncates to the correct address by discarding
everything following the innermost remaining angle bracket pair.
The behavior of these two rules may not be obvious. To better understand them, first create a small configuration file (called x.cf) that includes the following two lines: [2]
[2] Note that when a configuration file lacks an
S
command (to declare a rule set), all rules become part of rule set 0.
R$+ <$*> <$2> R<$*> $+ $: <$1>
Then run sendmail in rule-testing mode with a command like this:
%/usr/lib/sendmail -Cx.cf -bt
ADDRESS TEST MODE (ruleset 3 NOT automatically invoked) Enter <ruleset> <address>
Enter a series of addresses, one at a time, to see how each is handled. Be as extreme as you want when nesting angle brackets:
>0 <<<<<a>>>>>
rewrite: ruleset 0 input: < < < < < a > > > > > rewrite: ruleset 0 returns: < a > >0 <a> <b>
rewrite: ruleset 0 input: < a > < b > rewrite: ruleset 0 returns: < b > >0 <<a> <b>>
rewrite: ruleset 0 input: < < a > < b > > rewrite: ruleset 0 returns: < b > >
If you want to see, step by step, how each rule works,
run sendmail again, this time
with the -d21.12
debugging switch (see Section 37.5.72, -d21.12).
With that switch, the first example above will print like this:
>0 <<<<<a>>>>>
rewrite: ruleset 0 input: < < < < < a > > > > > ---trying rule: $+ < $* > ---rule matches: < $2 > rewritten as: < < < < a > > > > > ---trying rule: $+ < $* > ---rule matches: < $2 > rewritten as: < < < a > > > > > ---trying rule: $+ < $* > ---rule matches: < $2 > rewritten as: < < a > > > > > ---trying rule: $+ < $* > ---rule matches: < $2 > rewritten as: < a > > > > > ---trying rule: $+ < $* > --- rule fails ---trying rule: < $* > $+ ---rule matches: $: < $1 > rewritten as: < a > rewrite: ruleset 0 returns: < a >
The fourth rule in rule set 3 is designed to convert a null - pty)
address into the magic symbol @
:
R<> $@ <@> Null address to @
The @
symbol is surrounded by angle brackets ("focused").
It needs to be focused because later rules expect all addresses
to have the host part in this form. Still later, the angle brackets will be removed,
and the @
will be discarded by rule set 4.
The $@
prefix to the RHS causes all further rules in rule
set 3 to be skipped. The focused address <@>
is returned.
If <@>
were to be handled by the next rule, its angle
brackets would be stripped, and this is not what we desire.
The last of our five preliminary rules simply removes the angle brackets from whatever remains:
R<$*> $: $1 Strip remaining <>
The rules that we have just looked at isolate the address from other possible information and leave it in its initial form, not surrounded by angle brackets. The rest of the rules in rule set 3 are designed to highlight the host part of any address. They assume that all addresses are composed of a user and a host part.
RFC822 allows addresses of the form
name : address(s)
;
Here, name
is the name of a mailing list that
can contain multiple words and spaces, for example,
Undisclosed Recipients :;
The colon and semicolon are mandatory and may contain
one or more addresses between them, which may themselves be lists.
[3]
Rule set 3 needs to check for the presence of an empty list (one with
no addresses between the colon and semicolon). The following rule does just that and
turns the empty list into the magic token <@>
:
[3] Which tends to complicate the algorithm.
R$* :; $@ $1 :; <@> Handle empty List:;
After lists have been disposed of, domain-type addresses need to be handled. Domain type addresses are of the form user@host:
R$+ @ $+ $: $1 <@$2> Focus on host R$+ < $+ @ $+ > $1 $2 <@$3> move gaze right R$* < @ $* : $* > $* $1 <@ $2$3> $4 strip colons R$+ < @ $+ > $@ $>96 $1<@$2> localize and canonicalize
The first rule detects addresses of the form something@something and rewrites them in such a way that the second something becomes the focused host part.
The second rule handles addresses with
multiple @
symbols (such as a@b@c). It recursively
moves the focus to the rightmost host.
The third rule recursively removes any
colons from the resulting host part as a "sanity check."
This is necessary because
strange forms of route addresses may have bypassed earlier rules
(see the DontPruneRoutes
option in Section 34.8.20, DontPruneRoutes (R),
how route addresses are handled in rules in
Section 29.4.3, "Handling Routing Addresses",
and the F=d
delivery agent flag in Section 30.8.16, F=d), or
a colon may be left over from the mailertable
feature
(see Section 19.6.14, FEATURE(mailertable)).
The fourth rule passes any addresses that have been successfully focused to rule set 96 (which will be discussed in Section 17.2, "Rule Set 96") so that the local host can be detected and the host part canonicalized. The result from rule set 96 is returned.
UUCP addresses contain one or more exclamation points (such as lady!sonya!george). They fall into two categories: those that are delivered locally by uux(8) and those that are forwarded to another host. The rules to handle them look like this:
R$- ! $+ $@ $>96 $2 <@ $1.UUCP> host!user uucp R$+ . $- ! $+ $@ $>96 $3 <@ $1.$2> Domain style uucp R$+ ! $+ $@ $>96 $2 <@ $1.UUCP> Bang path uucp
The first rule looks for a single token hostname followed by an exclamation point. A single token host always becomes the next host in line for delivery. The .UUCP suffix added in the RHS allows rule set 0 to recognize this address as one requiring uux(8) delivery.
The second rule looks for a dot in the hostname part of the address. A dot indicates the new-style, domain-based hostname, such as host.domain!user. Such names are assumed to have MX records pointing to service providers and are rewritten into the normal [email protected] form.
The third rule catches any remaining addresses with exclamation points in them. The host to the left of the leftmost exclamation point is taken as the next hop in the UUCP path for delivery. A .UUCP suffix is added to that host, just as in the first rule.
All three rules exit (the leading $@
in the RHS) after
the address is normalized by rule set 96 (which leaves .UUCP
suffixed addresses unchanged). They are then handed as is to rule
set 0, which selects a delivery agent (usually uux(8)).
A common technique in mail debugging is to send mail to one host and have that host deliver it to another. Often, this is done by sending the mail something like:
%
Here, the intention is send mail to first
and from there to usr@second. This type of addressing is nonstandard.
Essentially, it is route addressing with %
characters
substituted for @
characters. Enabling this behavior requires
three rules:
R$*%$* $1 @ $2 Convert all % to @ R$*@$*@$* $1 % $2 @ $3 Undo all but last @ R$*@$* $@ $>96 $1 <@$2> Focus on rightmost
Here, the first rule changes all the percent characters into
@
characters. The intention is to focus on the rightmost
host, whether it is prefixed with an %
or an @
.
The second rule changes all but the rightmost @
back into
percent characters even if they were originally @
characters.
The last rule takes the result and focuses on the rightmost
host, just as was done in the domain form of addressing above.