Difference between revisions of "Filtering Spam with Procmail"

 
(25 intermediate revisions by 4 users not shown)
Line 1: Line 1:
The CS Department uses Spamassassin to   track spam e-mail   and  to tag the ''Subject'' line of that spam with with the text,  '''** SPAM **'''
+
The CS Department uses [[SpamAssassin]] to give every email a "spam score", which indicates the likelihood of the message being spam.  We do not filter out or delete any e-mail based on the spam score, although we do change the subject lines of messages that appear to be spam.  This page describes how to filter your spam with Procmail on our systems.
  
We do not filter out or delete any e-mail tagged as '''** SPAM **'''We merely tag it as such, and you can use your own mail client or
+
You can also change SpamAssassin's configuration to affect things like how it calculates its spam scores or what score it uses to decide that a message is really spamFor more information on that, please see [[SpamAssassin Configuration]].
[http://en.wikipedia.org/wiki/Procmail procmail] to filter/delete/move that mail.
 
  
(The exception is mail with viruses.  Our mail server's virus checker will remove mail that contains any virus that it can detect.)
+
== Log in to a Linux Client ==
  
 +
All of these changes must be done on one of the Department's [[:Category:Linux Clients|Linux clients]].
  
== Customizing Your Spamassassin preferences ==
+
== Create a .procmailrc File ==
  
The mail server does a global spamassassin sweep of incoming CS mail based on various criteriaYou can define additional criteria or other options by providing personal spamassassin options.
+
We use [http://www.procmail.org/ procmail] for mail filteringIt's controlled by a file in your home directory named <tt>.procmailrc</tt> (note the leading dot!)
  
To do this:
+
Create that file.  In most cases, that should be as easy as opening your preferred text editor, creating a new document, and saving it as <tt>.procmailrc</tt> in your home directory.
  
1) Create a directory called '''.spamassassin''' (don't forget the dot) under your home directory..
+
Put the following into your <tt>.procmailrc</tt> file:
  
 +
<pre>
 +
MAILDIR=$HOME/Mail
 +
LOGFILE=$MAILDIR/procmaillog
 +
#LOGABSTRACT=all
 +
#VERBOSE=yes
  
2) In your .spamassassin directory, create a file called '''user_prefs'''.
+
:0:
 +
* ^X-Spam-Flag: YES
 +
SPAM
 +
</pre>
  
'''user_prefs''' is the configuration file to put in your spamassassin customizations.
+
This will take every message that SpamAssassin thinks is spam and put those messages into a mail folder named "SPAM" instead of your main inbox.
  
For example, to change your Subject tag to something different, you could add
+
Save your <tt>.procmailrc</tt> file.
the line to the user_prefs file:
 
  
'''subject_tag ''**THIS IS S-P-A-M**'''''
+
At a command prompt, run the following command to ensure that the file's permissions are appropriate:
  
And then, mail identified as spam will have this new tag in the Subject line instead of the default, ''** SPAM **''.
+
chmod 600 ~/.procmailrc
  
Other options can be found at:  ''http://www.spamassassin.org/doc/Mail_SpamAssassin_Conf.html''
+
=== Explanation ===
(NOTE:  Not all of the options listed on that page are available to users, and we're using a older revision of spamassassin, so some options listed on that page may not be available for our version.)
 
  
You may find you want to see more header info, and the link above can supply you some info in doing so.
+
The first line, <tt>MAILDIR</tt>, sets the mail directory to be the Unix directory on our Grad/Research Net systems that you use to store your mail folders.
  
One common option to configure is the number of "'''required hits'''".  When spamassassin does it's tests, it provides your incoming message with a hit value (think of it as a point value).  The higher number of hits, the most likely the message is spam.  The lower the hits, the most likely the mail is ''ham'' (ham is mail that is not spam).  In the CS dept, mail that contains 7.0 or more hits is tagged as spam.  You can make that number lower, say, 5.0 or 6.0, meaning if spamassassin indicates the mail message has only 5 or 6 hits in the spamassassin scanner, then the message will be tagged as spam.  This increases the risk of having false positives (non-spam mail getting tagged as spam), but it also ensures that more of your spam mail will get tagged as spam.  For most users, you do not have this set the "required hits" option, since the global configuration sets it to a 7.0 threshold.  If you want to change that, change it in your user_prefs file with something like:
+
The <tt>LOGFILE</tt> can be set to keep a log of procmail activities for your account.  You should set this; if you don't and there are problems delivering your mail, procmail will send a bounce email back to the person who sent the mail that had the problem.
  
'''required_hits ''5.8'''''
+
<tt>LOGABSTRACT</tt> and <tt>VERBOSE</tt> will increase the information sent to the log file.  They're useful for debugging your configuration.  To activate them, just remove the "#" character at the beginning of their lines.
  
 +
'''Note''': Your log file will grow over time and may eventually use a lot of space.  You can clear it out with this command, run from the command line of one of our Linux Clients: <code>cat /dev/null >~/Mail/procmaillog</code>
  
Remember, at this point, you're deleting the spam mailYou're only tagging the Subject line of the spam mail.  If you decide not to create your own user_prefs, spammassassig will use the mail server's defaults to tag spam mail as being spam.
+
The section beginning with "<tt>:0:</tt>" files mail flagged as spam to a folder called <tt>SPAM</tt>That folder will be stored in the directory defined by the first line in the file, <tt>MAILDIR</tt>.
  
 +
The "<tt>:0:</tt>" line tells procmail that you're starting a new recipe.  (More precisely, "<code>:0</code>" starts a new recipe.  The final colon ("<code>:</code>") tells procmail that this recipe will be storing the message in a file, so it needs to lock the file while writing to make sure that no other program tries to write to the file at the same time.)
  
== Configuring Your Account to Filter  ** SPAM **-tagged mail with procmail. ==
+
In "<tt>* ^X-Spam-Flag: YES</tt>" the "*" at the beginning of the line tells procmail that you're giving it something to test each mail against The following [https://en.wikipedia.org/wiki/Regular_expression regular expression] tells procmail to match any message with an <tt>X-Spam-Flag:</tt> header whose value is <tt>YES</tt>.
  
The way this works is that as e-mail is delivered to
+
The last line of the recipe, "<tt>SPAM</tt>" gives the name of mail folder (within the directory given by <tt>MAILDIR</tt>) where matching messages should be placed.
  
 +
== Check Your Spam Folder ==
  
 +
Now, the most important step here is to wait a bit and see if **SPAM**-tagged mail continues to show up in your inbox.  If you haven't seen any of it in bit, then go to your new spam folder.  (It may depend on your mail program how you find that mail folder to view.)  Do you see the SPAM-tagged messages there?  If so, it's working.  If not, contact [mailto:support@cs.jhu.edu CS IT Support], and we'll try to figure out why it isn't working for you.
  
 +
VERY IMPORTANT:  When should you check your spam folder?  Often.  Why?  Two reasons.  1)  There could be some false positives in your folder, mail tagged as SPAM because it looked like spam, but may actually not be spam.  Example?  You might be subscribed to a mailing list, and the list sends you mail that looks like spam.  2) You need to go through ''regularly'' and clear out (delete/remove) your spam folder of spam messages to conserve room on our home directory server disks.  Some of the **SPAM**-tagged mail could be very big, especially if it contains attachments.  This could fill up our disks rapidly.  Please make sure you monitor your spam folder and keep it clean regularly.
  
[[Category:Spam Filtering]]
+
== Automatically Delete (Some) Spam ==
 +
 
 +
The score that SpamAssassin gives is an indication of how likely it thinks that a given email is spam.  One common thing to do is to automatically delete messages with high scores (because they're extremely unlikely to be false positives) while keeping lower-scored (but still classified as spam) messages so you can evaluate them individually.
 +
 
 +
By default, we tag all messages with a score greater than 7 as spam.  (This is very conservative.  Many people use a threshold of 5 and some get good results with thresholds as low as 2.)  If a message has a score of 15 or higher, it's almost certainly spam and can be discarded unread.  Here's a procmail rule to accomplish that:
 +
 
 +
<pre>
 +
# Drop spam over score 15.
 +
:0
 +
* ^X-Spam-Status: Yes, score=\/[0-9-]+
 +
* ? test $MATCH -gt 15
 +
/dev/null
 +
</pre>
 +
 
 +
'''Warning''': ''This will delete messages.''  Make sure you're okay with that before you use this rule.
 +
 
 +
== Learn More about procmail ==
 +
 
 +
There are a number of <tt>man</tt> pages about procmail on our Linux clients.  You can run <code>man procmailex</code> from a command prompt to see a number of useful examples of procmail use.  <code>man procmailrc</code> describes the ins and outs of the <tt>.procmailrc</tt> file.  (<code>man procmail</code> will tell you how to run procmail, but for the most part you shouldn't need to worry about that, since we run procmail for you automatically.)
 +
 
 +
If you're looking for more procmail examples, you can read through the extensive list at [http://www.netikka.net/tsneti/info/proctips.php Timo's procmail tips and recipes].
 +
 
 +
[[Category:Spam Filtering At CS]]

Latest revision as of 15:52, 8 November 2019

The CS Department uses SpamAssassin to give every email a "spam score", which indicates the likelihood of the message being spam. We do not filter out or delete any e-mail based on the spam score, although we do change the subject lines of messages that appear to be spam. This page describes how to filter your spam with Procmail on our systems.

You can also change SpamAssassin's configuration to affect things like how it calculates its spam scores or what score it uses to decide that a message is really spam. For more information on that, please see SpamAssassin Configuration.

Log in to a Linux Client

All of these changes must be done on one of the Department's Linux clients.

Create a .procmailrc File

We use procmail for mail filtering. It's controlled by a file in your home directory named .procmailrc (note the leading dot!)

Create that file. In most cases, that should be as easy as opening your preferred text editor, creating a new document, and saving it as .procmailrc in your home directory.

Put the following into your .procmailrc file:

MAILDIR=$HOME/Mail
LOGFILE=$MAILDIR/procmaillog
#LOGABSTRACT=all
#VERBOSE=yes

:0:
* ^X-Spam-Flag: YES
SPAM

This will take every message that SpamAssassin thinks is spam and put those messages into a mail folder named "SPAM" instead of your main inbox.

Save your .procmailrc file.

At a command prompt, run the following command to ensure that the file's permissions are appropriate:

chmod 600 ~/.procmailrc

Explanation

The first line, MAILDIR, sets the mail directory to be the Unix directory on our Grad/Research Net systems that you use to store your mail folders.

The LOGFILE can be set to keep a log of procmail activities for your account. You should set this; if you don't and there are problems delivering your mail, procmail will send a bounce email back to the person who sent the mail that had the problem.

LOGABSTRACT and VERBOSE will increase the information sent to the log file. They're useful for debugging your configuration. To activate them, just remove the "#" character at the beginning of their lines.

Note: Your log file will grow over time and may eventually use a lot of space. You can clear it out with this command, run from the command line of one of our Linux Clients: cat /dev/null >~/Mail/procmaillog

The section beginning with ":0:" files mail flagged as spam to a folder called SPAM. That folder will be stored in the directory defined by the first line in the file, MAILDIR.

The ":0:" line tells procmail that you're starting a new recipe. (More precisely, ":0" starts a new recipe. The final colon (":") tells procmail that this recipe will be storing the message in a file, so it needs to lock the file while writing to make sure that no other program tries to write to the file at the same time.)

In "* ^X-Spam-Flag: YES" the "*" at the beginning of the line tells procmail that you're giving it something to test each mail against The following regular expression tells procmail to match any message with an X-Spam-Flag: header whose value is YES.

The last line of the recipe, "SPAM" gives the name of mail folder (within the directory given by MAILDIR) where matching messages should be placed.

Check Your Spam Folder

Now, the most important step here is to wait a bit and see if **SPAM**-tagged mail continues to show up in your inbox. If you haven't seen any of it in bit, then go to your new spam folder. (It may depend on your mail program how you find that mail folder to view.) Do you see the SPAM-tagged messages there? If so, it's working. If not, contact CS IT Support, and we'll try to figure out why it isn't working for you.

VERY IMPORTANT: When should you check your spam folder? Often. Why? Two reasons. 1) There could be some false positives in your folder, mail tagged as SPAM because it looked like spam, but may actually not be spam. Example? You might be subscribed to a mailing list, and the list sends you mail that looks like spam. 2) You need to go through regularly and clear out (delete/remove) your spam folder of spam messages to conserve room on our home directory server disks. Some of the **SPAM**-tagged mail could be very big, especially if it contains attachments. This could fill up our disks rapidly. Please make sure you monitor your spam folder and keep it clean regularly.

Automatically Delete (Some) Spam

The score that SpamAssassin gives is an indication of how likely it thinks that a given email is spam. One common thing to do is to automatically delete messages with high scores (because they're extremely unlikely to be false positives) while keeping lower-scored (but still classified as spam) messages so you can evaluate them individually.

By default, we tag all messages with a score greater than 7 as spam. (This is very conservative. Many people use a threshold of 5 and some get good results with thresholds as low as 2.) If a message has a score of 15 or higher, it's almost certainly spam and can be discarded unread. Here's a procmail rule to accomplish that:

# Drop spam over score 15.
:0
* ^X-Spam-Status: Yes, score=\/[0-9-]+
* ? test $MATCH -gt 15
/dev/null

Warning: This will delete messages. Make sure you're okay with that before you use this rule.

Learn More about procmail

There are a number of man pages about procmail on our Linux clients. You can run man procmailex from a command prompt to see a number of useful examples of procmail use. man procmailrc describes the ins and outs of the .procmailrc file. (man procmail will tell you how to run procmail, but for the most part you shouldn't need to worry about that, since we run procmail for you automatically.)

If you're looking for more procmail examples, you can read through the extensive list at Timo's procmail tips and recipes.