Using Regular Expressions in sendmail

Run the sendmail -d0.1 command. The “Compiled with:” line output by the command should contain MAP_REGEX. If it does not, recompile sendmail.

Add custom code to the end of the master configuration file. Add the K command that defines the regular expression to the local information section of the sendmail.cf file using the LOCAL_CONFIG macro, and use a LOCAL_RULESETS macro to add a custom ruleset to access the regular expression. The Discussion section provides an example of how these commands are used.

Build the sendmail.cf file, copy it to /etc/mail/sendmail.cf, and restart sendmai.

Regular expressions are defined using the sendmail.cf K command, which is the same command used to define a database. The regular expression is then accessed from within the configuration in the same manner as a normal database. The following example taken from the knecht.mc file, illustrates how a regular expression is defined and used:

LOCAL_CONFIG

#

#  Regular expression to reject:

#    * numeric-only localparts from aol.com and msn.com

#    * localparts starting with a digit from juno.com

#

Kcheckaddress regex -a@MATCH

^([0-9]+<@(aol|msn)\.com|[0-9][^<]*<@juno\.com)\.?>

LOCAL_RULESETS  SLocal_check_mail

# check address against various regex checks

R$*                             $: $>Parse0 $>3 $1

R$+                             $: $(checkaddress $1 $)

R@MATCH                         $#error $: "553 Header error"

First, the LOCAL_CONFIG macro is added to the m4 master configuration file. The LOCAL_CONFIG macro marks the start of code that is to be added to the local information section of the sendmail.cf file. The K command that defines the regular expression follows this macro. The syntax of the K command is:

Kname type arguments

where K is the command, name is the internal name used to access the database defined by this command, type is the database type, and the arguments define the database being used. The arguments have the format:

flags description

where the flags define options used by the database and description identifies the database being used. The description, in most cases, is a path to an external database, either a local database or a map accessible through a database server. For a regular expression, however, the description is the definition of the regular expression against which input data is matched. The K command in the example is:

Kcheckaddress regex -a@MATCH     ^([0-9]+<@(aol|msn)\.com|[0-9][^<]*<@juno\.com)\.?>

In this example:

  • K is the command.
  • checkaddress is the internal name.
  • regex is the type.
  • -a@MATCH is a flag that tells sendmail to return the value @MATCH when a match is found.
  • ^([0-9]+<@(aol|msn)\.com|[0-9][^<]*<@juno\.com)\.?> is a regular expression. This is a basic regular expression that could be used with tools such as egrep and awk. This regular expression matches email addresses from aol.com, msn.com, and juno.com that contain numeric usernames.

The K command defines the regular expression, but a rewrite rule is needed to use it. The LOCAL_RULESETS macro is used to insert a custom ruleset into the sendmail.cf file. At the heart of the sample Local_check_mail ruleset are three R commands:

R$*                             $: $>Parse0 $>3 $1

R$+                             $: $(checkaddress $1 $)

R@MATCH                         $#error $: "553 Header error"

The address passed to the Local_check_mail ruleset is first processed through ruleset 3 (also called the canonify ruleset), and the result of that process is then passed through the Parse0 ruleset. Note that both of these rulesets are called by the first rewrite command. This processing puts the address into its canonical form. The address is then pattern matched against the checkaddress regular expression by the second rewrite rule. If it matches the regular expression, the address is replaced by the string @MATCH. The third rewrite rule checks to see if the workspace contains that string. If it does, a header error is returned.

A few tests show how the regular expression and the ruleset work:

# sendmail -bt

ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)

Enter <ruleset> <address>

> Local_check_mail 123@aol.com

Local_check_mail   input: 123 @ aol . com

canonify           input: 123 @ aol . com

Canonify2          input: 123 < @ aol . com >

Canonify2        returns: 123 < @ aol . com . >

canonify         returns: 123 < @ aol . com . >

Parse0             input: 123 < @ aol . com . >

Parse0           returns: 123 < @ aol . com . >

Local_check_mail returns: $# error $: "553 Header error"

> Local_check_mail win@aol.com

Local_check_mail   input: win @ aol . com

canonify           input: win @ aol . com

Canonify2          input: win < @ aol . com >

Canonify2        returns: win < @ aol . com . >

canonify         returns: win < @ aol . com . >

Parse0             input: win < @ aol . com . >

Parse0           returns: win < @ aol . com . >

Local_check_mail returns: win < @ aol . com . >

> /quit

The first test passes the address 123@aol.com to the Local_check_mail ruleset. This address should match the checkaddress regular expression. The error returned by the Local_check_mail ruleset shows that it does. The second test is run to show that valid addresses from aol.com do not generate the error.

This example, taken from the knecht.mc file, is not a recommendation that you filter out numeric aol.com addresses. It is an example of how a regular expression is defined and used. The LOCAL_CONFIG macro, the LOCAL_RULESET macro, and the syntax of the K command are the same for the custom regular expressions and rulesets that you create as they are for this simple example.