Defending E-mail HOWTO

Wil Cooley

Naked Ape Consulting

2005-04-16

Revision History
Revision 2.02005-04-16WRC
DRAFT Converted document to DocBook. DRAFT

Abstract

This document presents a guide to installing, configuring, and running amavisd-new, Postfix, SpamAssassin and related add-ons.


Table of Contents

Introduction
Copyrights and License
Source
Introduction
Understanding the Postfix content_filter Mechanism
Scanning with amavisd-new
Other Components
Sophos Anti-Virus
SpamAssassin
Configuration
Conclusion
References

Introduction

Copyrights and License

(c) 2005 Naked Ape Consulting, Ltd.

This section was copied from the HOWTO-HOWTO:

This manual may be reproduced in whole or in part, without fee, subject to the following restrictions:

  • The copyright notice above and this permission notice must be preserved complete on all complete or partial copies.

  • Any translation or derived work must be approved by the author in writing before distribution.

  • If you distribute this work in part, instructions for obtaining the complete version of this manual must be included, and a means for obtaining a complete version provided.

  • Small portions may be reproduced as illustrations for reviews or quotes in other works without this permission notice if proper citation is given. Exceptions to these rules may be granted for academic purposes: Write to the author and ask. These restrictions are here to protect us as authors, not to restrict you as learners and educators. Any source code (aside from the SGML this document was written in) in this document is placed under the GNU General Public License, available via anonymous FTP from the GNU archive.

Source

The home of this document is at http://nakedape.cc/info/Defending-Email-HOWTO/. A single-page version is at http://nakedape.cc/info/Defending-Email-HOWTO/Defending-Email-HOWTO.html. The source may be checked out from our public Subversion repository at https://haus.nakedape.cc/svn/public/trunk/writing/Defending-Email-HOWTO/.

Introduction

Spam and viruses are recurring plagues to e-mail users everywhere. Businesses face considerable risks from sexual harassment lawsuits from pornographic spam, lost productivity from damaged workstations and crashed servers and ever-increasing bandwidth usage from illegitimate network traffic. The problem is only getting worse.

This article is a gentle introduction to a complex set of services and a quick-start guide based on the configurations I have used in my own work, where I provide this setup as a service or pre-packaged server to my customers. The details of installing these software packages is beyond the scope of this article; all the packages contain installation instructions which are the authoritative sources for this information and cover installation in more detail than I am able to in this format.

This setup uses open source software for all components except for the virus-scanning, which is a licensed product from Sophos Anti-Virus. My installations have been on Red Hat Linux and Red Hat's Enterprise products using SpamAssassin 2.60, Postfix 1.1.13, and amavisd-new 20030616-p5. SpamAssassin has provided Bayesian classification since version 2.50; 2.60 introduced many improvements and is the recommended minimum. The most current amavisd-new is also recommended. Nearly any later Postfix 1.1 or 2.0 may be used.

Understanding the Postfix content_filter Mechanism

The Postfix mail transfer agent (MTA) provides content_filter, a flexible mechanism designed to allow the integration of arbitrary mail filters through a well-defined interface. Postfix passes the message into the content filter at the queuing stage, which in the diagrams in the Postfix documentation is labelled qmgr, which is the name of the internal process that maintains queues for incoming and outgoing messages.

The simplest of the content_filter interfaces is a UNIX pipe, where Postfix writes to the filter's standard input and checks the filter's return value for pre-defined status codes. Mail is injected back into Postfix using the sendmail command, which is the Postfix replacement for the same Sendmail command. The example in the FILTER_README document that ships with Postfix uses cat and acts as a simple pass-through filter.

The best interface for sophisticated mail filtering is not the pipe mechanism, but rather the SMTP proxy mechanism. This provides a well-defined interface between Postfix and the filter and allows a persistent scanning daemon, which reduces overhead and allows the filtering proxy to deliver reliably. Because the SMTP proxy mechanism operates over TCP/IP, the content filter can be located on a separate host than the Postfix service that receives the mail.

The content_filter requires Postfix to connect to the proxy as an SMTP client, which handles the message and then connects back into a special SMTP server which handles the final routing and delivery of the message. As mentioned before, Postfix sends the message to the content filter at the queuing stage; this will be called the "sending process"; the message is re-injected into the Postfix system in the smtpd stage, which will be called the "receiving process."

Scanning with amavisd-new

amavisd-new operates as an SMTP proxy, designed to hook in to the content_filter mechanism of Postfix. (amavisd-new also supports Sendmail through its Milter interface and in a dual-server configuration and Exim using it as an SMTP proxy.) amavisd-new is written, maintained, and supported by Mark Martinec of the Jozef Stefan Institue in Slovenija and is aided by an active user community. It began as a fork from amavisd, a daemonized version of the AMaViS scanner, which was one of the first open source mail virus scanners. The modern amavisd-new scarcely resemebles its predecessor and improves upon it considerably, while a difference of philosophies has prevented amavisd-new from being merged back into amavisd or simply taking over as the official scanning engine of the AMaViS project. That should not sound like a euphemism for saying that either Mark or the developers of the original AMaViS have overbearing personalities; Mark himself is helpful and personable on the amavis-users mailing list, and the original AMaViS folks have been quite tolerant of the amavisd-new traffic on their list, which these days accounts for at least half.

amavisd-new is written in Perl, and takes advantage of a number of Perl modules, such as the Net::Server module, which it uses to implement a robust forked parent/child environment, and Mail::SpamAssassin, the core scanning engine of SpamAssassin, so it does not need to use 'spamd'.

Mark strives to make amavisd-new adhere to standards and as the result is a well-regarded and standards-friendly system. It is also fast, although as a Perl application it can use a considerable amount of memory, so it is best limited to a few child processes. As amavisd-new is very flexible, the myriad of options makes it intimidating to configure.

amavisd-new is able to reliably deliver messages because it never has to store the messages itself; as soon as the Postfix sending process connects to amavisd-new, amavisd-new connects to the Postfix receiving process and does not send an acceptance response to the sending process until the receiving process has also accepted the message. Should the receiving process fail, amavisd-new will return a status code to the Postfix sending process and Postfix will maintain the message in its queue until the problem is rectified.

Other Components

Sophos Anti-Virus

Sophos Anti-Virus may be the best kept secret in the anti-virus industry. Sophos produces a very high quality scanning engine and provides very good support. Sophos caters to the enterprise, so their products are built for a diversity of platforms. Their products are not sold retail, which is part of the reason they are able to maintain the high quality of their support. (NB: I am a Sophos reseller, so my opinions are biased.)

Most of Sophos' products ship with 'sweep', a command-line scanner, but the the command-line client is mostly a wrapper around SAVI, their scanning engine library. SAVI can also be licensed on its own in blocks of 50 users. This is a good solution for customers who only want e-mail filtering or already have another anti-virus product. Sophos provides the C headers to SAVI to developers, and Vanja Hrustic used them to implement Sophie. Sophie is a daemonized scanner with which external processes communicate through a UNIX domain socket. Because loading the virus definitions and initializing the scanning engine requires considerable overhead in program initialization, it is better to have a daemonizeds canner, especially when using 'sweep' would require many invocations.

SpamAssassin

SpamAssassin itself probably needs no introduction. It is widely regarded as the best open source spam-filtering engine available (and is probably better than most proprietary offerings). It includes a wide range of heuristic patterns and scores which are applied to messages, connections to external sources, such as DNSRBLs, Razor2 and DCC. Later versions also implement Bayesian classification, which greatly improves detection quality.

SpamAssassin operates by applying a number of heuristic tests to messages, and then scoring the message. More factors are taken into consideration by using a battery of tests to produce more accurate results than an all-or-nothing keyword filter can provide. Consider, for example, if anything containing the word "viagra" were rejected as spam, much Viagra(tm)-related spam would be discarded, but you would also miss the joke forwarded by your buddy or the polite request from your wife. Configurable thresholds allow an administrator or user to decide how aggressive to be when deciding whether or not a message is spam.

The more tests SpamAssassin uses, the higher the scores will be for spam, so the higher the thresholds can be set and the more accurate filtering will be. Razor2, DCC, and Pyzor are external tests which are highly recommended. All three work by computing checksums of messages which have been recognized as spam and comparing those to databases on the Internet. DCC uses a fuzzy checksum to detect spam, especially messages where spammers have added random characters to parts of the message in an attempt to fool the other systems. If possible, register and send checksums of spam not caught by these system back to network; they operate as community efforts, so your contributions help.

SpamAssassin will also use DNS real-time black-hole lists (DNSRBLs) and right-hand-side black-hole list (RHSBLs). It assigns different scores based on the reputation or focus of the RBL, and as a result is safer than using the lists directly through your MTA; for example, instead of simply blocking all messages originating at from a dial-up address, the RBL simply adds to the score, so if the message is otherwise not spam-like, it will be allowed through. (I have seen desperate small ISPs block mail entirely from addresses owned by particular national IPSs, which is a very bad practice, since it alienates sizable portions of the Internet.) SpamAssassin is also able to apply the RBL tests to all hosts through which the message has passed, which is more effective than only applying it to the SMTP peer, as an MTA usually does.

Among the newest and most exciting features is the Bayesian classification and auto-learning, which allow it to learn what is spam and non-spam based on scores of messages it sees. Bayesian classification is a sophisticated technique to statistically determine the likelihood of a message being spam, based on identifying tokens (words, etc.) as either being most frequently used in spam or "ham" (non-spam). It's a good deal more flexible than the static regular expression matching that SpamAssassin does without it; as a result it has to learn which tokens are spam tokens and which are ham tokens, so it must be trained, usually be feeding it messages and indicated which it is. Since SpamAssassin still has a number of other tests it performs, it can also automatically teach the Bayesian classifier based on the message's score, so it continuously adapts its idea of what is spam or ham based on the messages it receives.

Configuration

First, make sure Postfix is working as it should without any filtering. Always try to proceed in small, incremental steps, testing each along the way.

Next, configure amavisd-new. The configuration file defaults to /etc/amavisd.conf, but I prefer to relocate it to /etc/amavis/amavisd.conf by including the -c option at start-up. The amavisd.conf file itself is very heavily commented and I recommend reading through the whole thing. The configuration file is all Perl code, so any settings must be valid Perl statements.

amavisd-new provides a few utility functions to provide configuration maps in a number of ways; the read_hash function uses a newline-separated list, which is read into a hash table at initialization; this is the same format Postfix uses for lists in a number of places, so there is opportunity for sharing configuration.

new_RE builds a list of Perl regular expressions. Other variables are basic Perl arrays or hashes, often using the 'qw()' form for a "quoted word" array (i.e., each whitespace-separated "word" is an element) or 'qr()' for quoted regular expressions.

Here are a few changes from the defaults to get started. The first $mydomain is obvious; set your local domain here (don't worry if you also host other domains):

$mydomain = 'example.com'; 

I prefer to run mine as user/group 'amavis/amavis', created with home directory /var/spool/amavis:

$daemon_user  = 'amavis';
$daemon_group = 'amavis'; 

amavisd-new does most of its work in a tmpfs file system, but certain things (like the SpamAssassin Bayesian database) need to be persistently stored, so set the temporary directory to a subdirectory of the home directory:

$TEMPBASE = $MYHOME/tmp; 

With Linux 2.4 and later a temporary filesystem can be used by adding the following to /etc/fstab and running 'mount /var/spool/amavis/tmp':

none /var/spool/amavis/tmp tmpfs mode=770,uid=amavis,gid=wheel 0 0 

The mode, uid and gid settings change the default permissions on the mounted filesystem; I use the 'wheel' group because I use it in other places to give administrative users extra access and to control access to 'su' and 'sudo'. This allows them to inspect the quarantine directory and such without having to become amavis.

The FILTER_README from Postfix and the README.postfix from amavisd-new differ on which ports to use; the former uses 10025 and 10026, while the latter uses 10024 and 10025. It doesn't matter which, provided that the pair is used consistently. To change to the (10025,10026) pair (which will be used in these examples), use:

$inet_socket_port = 10025;
$forward_method = 'smtp:127.0.0.1:10026';
$notify_method = $forward_method; 

To configure the domains local to the mail system, use the read_hash function as described above:

read_hash(\%local_domains, '/etc/postfix/local_domains'); 

If Postfix local domains are stored in a flat file, this will allow sharing of the configuration between both systems.

The read_hash function is also useful for maintaining white and black lists; the newline-separated list might be easier to maintain than having it within amavisd.conf, particularly if these lists will be maintained by a junior administrator.

read_hash(\%whitelist_sender, '/etc/amavis/whitelist_sender'); 

You may also want to keep a blacklist of known bad senders; however, it might be more trouble than its worth, because spammers rarely use the same address for long and the sender is usually faked, and because SpamAssassin provides superior detection with less fuss.

read_hash(\%blacklist_sender, '/etc/amavis/blacklist_sender'); 

To avoid sending bogus virus bounces from viruses like Klez that are known to fake the sender's address, there is a regular expression map; I recommend that this list be maintained very carefully, because new viruses which do this come out frequently and bounces are nearly as much of a problem as the viruses themselves (especially to those who do not run Outlook). This is done using the 'new_RE' function mentioned above:

$viruses_that_fake_sender_re = new_RE(
    qr'nimda|hybris|klez|bugbear|yaha|braid|sobig|fizzer|palyh|peido|holar'i,
    qr'tanatos|lentin|bridex|mimail|trojan\.dropper'i,
); 

You might consider, instead, changing the action amavisd-new takes to simply discard virus messages. Doing so might not alert a sender that the message he has sent had a virus in an attachment (such as when your boss sends an infected Word document to a colleague and wonders why she gets no response), so use it with caution. You can do this by using 'D_DISCARD' instead of 'D_BOUNCE' in $final_virus_destiny:

$final_virus_destiny = D_DISCARD; 

Even if anti-virus software is not used, the chance a virus will infect your users can be reduced by having amavisd-new filter dangerous attachment types which generally should not be passed through e-mail. Microsoft has a Knowledge Base article which lists these, and this setting is based on that list (see References). If you have Outlook users who like to send TNEF attachments, remove 'tnef' from below.

$banned_filename_re = new_RE(
    qr'\.[a-zA-Z][a-zA-Z0-9]{0,3}\.(vbs|pif|scr|bat|com|exe|dll)$'i,
    qr'.\.(ade|adp|bas|bat|chm|cmd|com|cpl|crt|exe|hlp|hta|inf|ins|isp|js|
            jse|lnk|mdb|mde|msc|msi|msp|mst|pcd|pif|reg|scr|sct|shs|shb|vb|
            vbe|vbs|wsc|wsf|wsh)$'ix,
    qr'^\.(exe|lha|tnef)$'i,
    qr'^application/x-msdownload$'i,
    qr'^message/partial$'i,
    qr'^message/external-body$'i,
); 

Controlling who gets scanned for what may be either the simplest or the most complex configuration setting, depending on your site. Most businesses will probably want to apply a uniform setting; an ISP might want to provide different degrees of aggressiveness depending on customer sensitivity, or simply not filter certain customers at all. In the case of applying a uniform setting, no additional configuration needs to be done. For the latter case, however, amavisd-new offers a number of options, including various Perl array, hash, or regular expression maps, and SQL and LDAP lookups. The SQL and LDAP maps have the benefit of being able to be updated without restarting the amavisd-new daemon, which is necessary in an environment where frequent changes are made. The README.lookups document in the amavisd-new distribution is the authoritative text on configuring these; the scope of these are outside of this document.

Roughly speaking, the 'lovers' maps will allow users to receive all messages, even if it is found to be spam or to contain a virus, while it will notify administrators and recipients (if configured to do so). The 'bypass' maps bypass scanning for the selected type altogether; I recommend this only for viruses. For spam, avoid the 'bypass' maps but set impossibly high values for tag and kill levels (such as '999')--this will let SpamAssassin scan the message and the autolearning feature will feed Bayesian classifier, making it more accurate. It will increase load on your server, so use with care on older systems or in high-volume environments.

The tag, tag2 and kill levels control the filtering that is applied to spam based on the score supplied by SpamAssassin. tag just adds 'X-Spam-Status' and 'X-Spam-Level' headers, so most users will not notice them. To see the score for all messages, use a very low tag level, such as '-999'. tag2 level adds 'X-Spam-Flag: YES' and prepends to the subject '***SPAM***'; I do not usually use this, because I have enough confidence in SpamAssassin that I just let it quarantine the messages. kill does just that--quarantines the message and sends a report to the administrator. The default kill level is 6.3, which is not bad; to start out, consider raising it 8 or 10 just to be safe. To set these levels globally, use the following:

$sa_tag_level_deflt = -999;
$sa_tag2_level_deflt = 6.3;
$sa_kill_level_deflt = $sa_tag2_level_deflt; 

For initial testing, disable external tests in SpamAssassin. This will disable DNSRBL, DCC, Razor2 and Pyzor tests. You will certainly want to use these, but turn them off initially until you are confident that the rest of the system is working:

$sa_local_tests_only = 1; 

To configure virus-scanning, the '@av_scanners' array lists a number of configurations for well-known virus scanners. As the comment in the default configuration file preceeding this array states, leaving in scanners which use internal subroutines such as '\&ask_daemon' will aversely affect performance; command-line scanners which are not found at initialization time will not be used, but commenting them out does not hurt and might marginally improve start-up time.

Uncomment the Sophie section:

['Sophie',
   \&ask_daemon, ["{}/\n", '/var/run/sophie'],
   qr/(?x)^ 0+ ( : | [\000\r\n]* $)/,  qr/(?x)^ 1 ( : | [\000\r\n]* $)/,
   qr/(?x)^ [-+]? \d+ : (.*?) [\000\r\n]* $/ ], 

When Sophie was built, the configure script option --with-socketfile allows it to be placed somewhere other than /var/run/sophie. If so, change the second line above to match that. If Sophie was not built with --with-user=amavis and --with-group=amavis; add the amavis user to whatever group was provided. If you do not, amavisd-new will not be able to communicate with Sophie.

To use the Bayesian classifier in SpamAssassin, set the following variables for SpamAssassin in the ~amavis/.spamassassin/user_prefs:

use_bayes 1
auto_learn 1 

Once you're confident of the rest of the system working, consider making it auto-learn faster by reducing the threshold for spam; to do so add:

auto_learn_threshold_spam 8 

By default, this is set to 12, which will make it learn fairly slowly. The Bayesian classifier will not begin to contribute to the score of messages until there are at least 200 each of spam and non-spam messages. Monitor spam alerts to determine an appropriate value for the spam threshold, although a score below 8 is not recommended.

To view the number of spam and non-spam messages, issue the command 'sa-learn --dump magic' as the 'amavis' user:

$ sudo su - amavis
$ sa-learn --dump magic
0.000          0          2          0  non-token data: bayes db version
0.000          0       3012          0  non-token data: nspam
0.000          0       4218          0  non-token data: nham
...  

Configuring Postfix requires adding the following to /etc/postfix/master.cf and issuing the command 'postfix reload':

 
smtp-amavis unix -      -       y/n     -       2  smtp
    -o smtp_data_done_timeout=1200
    -o disable_dns_lookups=yes
                                                                                
127.0.0.1:10026 inet n  -       y/n     -       -  smtpd
    -o content_filter=
    -o local_recipient_maps=
    -o relay_recipient_maps=
    -o smtpd_restriction_classes=
    -o smtpd_client_restrictions=
    -o smtpd_helo_restrictions=
    -o smtpd_sender_restrictions=
    -o smtpd_recipient_restrictions=permit_mynetworks,reject
    -o mynetworks=127.0.0.0/8
    -o strict_rfc821_envelopes=yes 

Note that the 'content_filter' in the second section is empty; since this is the receiving process, it should not pass the incoming message to the global content_filter, otherwise there would be a mail loop.

At this point, amavisd-new and Postfix are configured to receive messages, and amavisd-new is configured to inject them back into Postfix, but Postfix is not sending any mail into amavisd-new yet, so there is an opportunity to test before going live. Telnet to port 10025 on localhost and send a test message through amavisd-new using SMTP commands. You might want to send, for example, an EICAR virus-testing pattern or sample spam and non-spam messages. Also telnet to port 10026 on localhost to verify that Postifx is listening and accepting messages.

Finally, to make Postfix send incoming messages through amavisd-new, add the following to /etc/postfix/main.cf and issue the command 'postfix reload':

content_filter = smtp-amavis:[127.0.0.1]:10025 

If you run into trouble, comment the preceeding line out and reload Postfix. If you are having trouble, add debug to the command-line options to start amavisd-new and it will log verbosely and not detach from the terminal. If you need help, re-read the documentation, capture a log of the problem, and join the amavis-user mailing list (see References).

Conclusion

If you have followed these instructions through, your e-mail server is now running live with a high-performance and accurate virus and spam filter on your e-mail. You should have a basic understanding of the workings of the system, which will be of tremendous use in the event of difficulties. As with any system that can cause loss of data, monitor the spam and virus reports closely to ensure that the scanning is not too aggressive.

While spam and e-mail viruses are becoming increasing problems for users and businesses everywhere, the configuration described herein can help provide a good layer of protection to your organization and your users at a minimal cost.

References