Difference between revisions of "Filtering Spam with Procmail"

 
(11 intermediate revisions by 4 users not shown)
Line 1: Line 1:
The CS Department uses Spamassassin ''on the Grad/Research Net only'' to  track spam e-mail   and  to tag the ''Subject'' line of that spam with with the text,  '''** SPAM **'''
+
The CS Department uses [[SpamAssassin]] to give every email a "spam score", which indicates the likelihood of the message being spam. We do not filter out or delete any e-mail based on the spam score, although we do change the subject lines of messages that appear to be spam.  This page describes how to filter your spam with Procmail on our systems.
  
We do not filter out or delete any e-mail tagged as '''** SPAM **'''We merely tag it as such, and you can use your own mail client or
+
You can also change SpamAssassin's configuration to affect things like how it calculates its spam scores or what score it uses to decide that a message is really spamFor more information on that, please see [[SpamAssassin Configuration]].
[http://en.wikipedia.org/wiki/Procmail procmail] to filter/delete/move that mail.
 
  
(The exception is mail with viruses.  Our mail server's virus checker will remove mail that contains any virus that it can detect.)
+
== Log in to a Linux Client ==
  
 +
All of these changes must be done on one of the Department's [[:Category:Linux Clients|Linux clients]].
  
== Customizing Your Spamassassin preferences ==
+
== Create a .procmailrc File ==
  
The mail server does a global spamassassin sweep of incoming CS mail based on various criteriaYou can define additional criteria or overriding options by providing personal spamassassin options.
+
We use [http://www.procmail.org/ procmail] for mail filteringIt's controlled by a file in your home directory named <tt>.procmailrc</tt> (note the leading dot!)
  
To do this:
+
Create that file.  In most cases, that should be as easy as opening your preferred text editor, creating a new document, and saving it as <tt>.procmailrc</tt> in your home directory.
  
1) Create a directory called '''.spamassassin''' (don't forget the dot) under your home directory..
+
Put the following into your <tt>.procmailrc</tt> file:
 
 
 
 
2) In your .spamassassin directory, create a file called '''user_prefs'''.
 
 
 
'''user_prefs''' is the configuration file to put in your spamassassin customizations.
 
 
 
For example, to change your Subject tag to something different, you could add
 
the line to the user_prefs file:
 
 
 
'''subject_tag ''**THIS IS S-P-A-M**'''''
 
 
 
And then, mail identified as spam will have this new tag (''**THIS IS S-P-A-M**'') in the Subject line instead of the default, ''** SPAM **''.
 
 
 
Other options can be found at:  ''http://www.spamassassin.org/doc/Mail_SpamAssassin_Conf.html''
 
(NOTE:  Not all of the options listed on that page are available to users, and we're using a older revision of spamassassin, so some options listed on that page may not be available for our version.)
 
 
 
You may find you want to see more header info, and the link above can supply you some info in doing so.
 
 
 
 
 
== Configuring Number of Points Required to Tag an E-mail as Spam. ==
 
 
 
 
 
One common option to configure is the number of "'''required hits'''".  When spamassassin does its tests, it provides your incoming message with a hit value (think of it as a point value).  The higher number of hits, the most likely the message is spam.  The lower the hits, the most likely the mail is ''ham'' (ham is mail that is not spam).  In the CS Sept, mail that contains 7.0 or more hits is tagged as spam with the default tag, '''** SPAM **'''.  You can make that  hit number lower, say, 5.0 or 6.0, meaning if spamassassin indicates the mail message has only 5 or 6 hits in the spamassassin scanner, then the message will be tagged as spam.  This increases the risk of having false positives (non-spam mail getting tagged as spam), but it also ensures that more of your spam mail will get tagged as spam.  For most users, you do not have to set the "required hits" option, since the global configuration sets it to a 7.0 threshold.  If you want to change that, change it in your user_prefs file with something like:
 
 
 
'''required_hits ''5.8'''''
 
 
 
Remember, at this point, you're ''not'' deleting the spam mail.  You're only ''tagging'' the '''Subject''' line of the spam mail.  If you decide not to create your own user_prefs, spammassassin will use the mail server's defaults to tag spam mail as being spam.
 
 
 
 
 
== Whitelists (or how to prevent some real mail from being tagged as spam.) ==
 
 
 
 
 
When you receive  ** SPAM **-tagged mail from someone you know, you'll probably want to put that person's e-mail address on a ''whitelist'', so that that person's e-mail doesn't get tagged as ** SPAM ** again.
 
 
 
To do that, add a ''whitelist_from'' line to your '''user_prefs''' file.
 
 
 
 
 
For example, say that e-mail from your colleague ''mike@yahoo.com'' gets tagged as ** SPAM **.
 
 
 
Add:
 
 
 
'''whitelist_from mike@yahoo.com'''
 
 
 
to your '''user_prefs''' file (shouldn't matter where in the file you put it),
 
 
 
And now, e-mail from ''mike@yahoo.com'' will ''not'' be tagged as spam when you receive it.
 
 
 
(you'll also notice that now, the ''X-Spam-Status'' line in the mail headers from mike@yahoo.com includes  "''USER_IN_WHITELIST''")
 
 
 
 
 
You can also whitelist an entire domain.  For example, if you want all your incoming mail from blah.com to never be tagged as spam, add the following whitelist entry to user_prefs:
 
 
 
'''whitelist_from blah.com'''
 
 
 
 
 
 
 
== Configuring Your Account to Filter  ** SPAM **-tagged mail with ''procmail''. ==
 
 
 
Procmail is a system that allows you to manipulate mail messages as they arrive.  There is a global (server) procmail configuration, but you can also create your own configuration for your account. There are many options for procmail, but you can find useful tips and recipes for procmail at:  [http://lipas.uwasa.fi/~ts/info/proctips.html http://lipas.uwasa.fi/~ts/info/proctips.html]
 
 
 
As an easy example, let's take a user named joe.  joe wants to file mail tagged as '''** SPAM **''' into a seperate spam folder for later review.
 
 
 
create a file called .procmailrc (don't forget the dot) in your home directory..
 
 
 
Edit the file to look something like this:
 
  
 
<pre>
 
<pre>
MAILDIR=/users/joe/Mail
+
MAILDIR=$HOME/Mail
DEFAULT=/var/mail/joe
 
 
LOGFILE=$MAILDIR/procmaillog
 
LOGFILE=$MAILDIR/procmaillog
LOCKEXT=.lock
+
#LOGABSTRACT=all
 +
#VERBOSE=yes
  
 
:0:
 
:0:
* ^Subject.*\*\*SPAM\*\*.*
+
* ^X-Spam-Flag: YES
SPAMFOLDER
+
SPAM
 
</pre>
 
</pre>
  
The first line, MAILDIR, sets the mail directory to be the Unix directory on our Grad/Research Net systems that you use to store your mail folders.
+
This will take every message that SpamAssassin thinks is spam and put those messages into a mail folder named "SPAM" instead of your main inbox.
  
The DEFAULT line is standard; it is the location of your inbox.  
+
Save your <tt>.procmailrc</tt> file.
  
The LOGFILE can be set to keep a log of procmail activities for your account.
+
At a command prompt, run the following command to ensure that the file's permissions are appropriate:
  
''Note: Please clean out the your procmaillog file occasionally with the following command on a Grad/Research Net :''
+
  chmod 600 ~/.procmailrc
<pre>  cat /dev/null > /users/joe/Mail/procmaillog</pre>  ''(of course, change "joe" to your account name.)''
 
  
LOCKEXT line is needed as is.
+
=== Explanation ===
  
Now, the section beginning with '':0:'' does the file of mail with the ** SPAM ** tag in its subject to a folder called SPAMFOLDER. That folder will be stored in the directory defined by the first line in the file, MAILDIR.
+
The first line, <tt>MAILDIR</tt>, sets the mail directory to be the Unix directory on our Grad/Research Net systems that you use to store your mail folders.
the * ^Subject line says, "if the Subject contains ** SPAM **, then...  file it into SPAMFOLDER.
 
  
Notice the backslashes.  In some forms of Unix, you need to make sure you have \ marks before certain instances of special characters like *We call that "escaping the characting".  In this case, some of the * characters in that line are "escaped", and some are not.  Just follow the example.
+
The <tt>LOGFILE</tt> can be set to keep a log of procmail activities for your accountYou should set this; if you don't and there are problems delivering your mail, procmail will send a bounce email back to the person who sent the mail that had the problem.
  
Save your .procmailrc file.
+
<tt>LOGABSTRACT</tt> and <tt>VERBOSE</tt> will increase the information sent to the log file.  They're useful for debugging your configuration. To activate them, just remove the "#" character at the beginning of their lines.
  
At a unix command prompt, run the following command:
+
'''Note''': Your log file will grow over time and may eventually use a lot of space.  You can clear it out with this command, run from the command line of one of our Linux Clients: <code>cat /dev/null >~/Mail/procmaillog</code>
  
chmod 600 .procmailrc
+
The section beginning with "<tt>:0:</tt>" files mail flagged as spam to a folder called <tt>SPAM</tt>.  That folder will be stored in the directory defined by the first line in the file, <tt>MAILDIR</tt>.
  
At this point, we can tell the mail system to start filing our ** SPAM **-tagged mail.
+
The "<tt>:0:</tt>" line tells procmail that you're starting a new recipe.  (More precisely, "<code>:0</code>" starts a new recipe.  The final colon ("<code>:</code>") tells procmail that this recipe will be storing the message in a file, so it needs to lock the file while writing to make sure that no other program tries to write to the file at the same time.)
  
== Activate the .procmailrc Configuration ==
+
In "<tt>* ^X-Spam-Flag: YES</tt>" the "*" at the beginning of the line tells procmail that you're giving it something to test each mail against The following [https://en.wikipedia.org/wiki/Regular_expression regular expression] tells procmail to match any message with an <tt>X-Spam-Flag:</tt> header whose value is <tt>YES</tt>.
  
To do this, simply create/edit another file (in your top-level home directory) called '''.forward'''
+
The last line of the recipe, "<tt>SPAM</tt>" gives the name of mail folder (within the directory given by <tt>MAILDIR</tt>) where matching messages should be placed.
(again, remember the dot)
 
  
In .forward, place the following text:
+
== Check Your Spam Folder ==
  
'''"| /usr/local/scripts/spamc |exec /usr/local/bin/procmail"'''
+
Now, the most important step here is to wait a bit and see if **SPAM**-tagged mail continues to show up in your inbox.  If you haven't seen any of it in bit, then go to your new spam folder.  (It may depend on your mail program how you find that mail folder to view.)  Do you see the SPAM-tagged messages there?  If so, it's working.  If not, contact [mailto:support@cs.jhu.edu CS IT Support], and we'll try to figure out why it isn't working for you.
  
(Include the " " marks in the above line.)
+
VERY IMPORTANT:  When should you check your spam folder?  Often.  Why?  Two reasons.  1)  There could be some false positives in your folder, mail tagged as SPAM because it looked like spam, but may actually not be spam.  Example?  You might be subscribed to a mailing list, and the list sends you mail that looks like spam. 2) You need to go through ''regularly'' and clear out (delete/remove) your spam folder of spam messages to conserve room on our home directory server disks.  Some of the **SPAM**-tagged mail could be very big, especially if it contains attachments.  This could fill up our disks rapidly.  Please make sure you monitor your spam folder and keep it clean regularly.
  
And save the file.  Make sure the permissions are correct on the file.  If
+
== Automatically Delete (Some) Spam ==
you're not sure, just set the permissions at a unix prompt with:
 
  
chmod 600 .forward
+
The score that SpamAssassin gives is an indication of how likely it thinks that a given email is spam.  One common thing to do is to automatically delete messages with high scores (because they're extremely unlikely to be false positives) while keeping lower-scored (but still classified as spam) messages so you can evaluate them individually.
  
That should do it**SPAM**-tagged mail should start to show up in the new mail
+
By default, we tag all messages with a score greater than 7 as spam(This is very conservativeMany people use a threshold of 5 and some get good results with thresholds as low as 2.)  If a message has a score of 15 or higher, it's almost certainly spam and can be discarded unreadHere's a procmail rule to accomplish that:
folder.
 
 
 
Now, the most important step here is to wait a bit and see if **SPAM**-tagged
 
mail continues to show up in your inboxIf you haven't seen any of it in bit,
 
then go to your new spam folder.  (It may depend on your mail program how you
 
find that mail folder to view.) Do you see the SPAM-tagged messages there? If  
 
so, it's working.  If not, contact support@cs.jhu.edu, and we'll try to figure
 
out why it isn't working for you.
 
 
 
VERY IMPORTANT:  When should you check your spam folder?  Often.  Why?  Two
 
reasons.  1)  There could be some false positives in your folder, mail tagged as
 
SPAM because it looked like spam, but may actually not be spamExample?  You
 
might be subscribed to a mailing list, and the list sends you mail that looks
 
like spam.  2) You need to go through *regularly* and clear out (delete/remove)
 
your spam folder of spam messages to conserve room on our home directory server
 
disks.  Some of the **SPAM**-tagged mail could be very big, especially if it
 
contains attachments.  This could fill up our disks rapidly.  Please make sure
 
you monitor your spam folder and keep it clean regularly.
 
  
 +
<pre>
 +
# Drop spam over score 15.
 +
:0
 +
* ^X-Spam-Status: Yes, score=\/[0-9-]+
 +
* ? test $MATCH -gt 15
 +
/dev/null
 +
</pre>
  
 +
'''Warning''': ''This will delete messages.''  Make sure you're okay with that before you use this rule.
  
 +
== Learn More about procmail ==
  
 +
There are a number of <tt>man</tt> pages about procmail on our Linux clients.  You can run <code>man procmailex</code> from a command prompt to see a number of useful examples of procmail use.  <code>man procmailrc</code> describes the ins and outs of the <tt>.procmailrc</tt> file.  (<code>man procmail</code> will tell you how to run procmail, but for the most part you shouldn't need to worry about that, since we run procmail for you automatically.)
  
 +
If you're looking for more procmail examples, you can read through the extensive list at [http://www.netikka.net/tsneti/info/proctips.php Timo's procmail tips and recipes].
  
[[Category:Spam Filtering]]
+
[[Category:Spam Filtering At CS]]

Latest revision as of 15:52, 8 November 2019

The CS Department uses SpamAssassin to give every email a "spam score", which indicates the likelihood of the message being spam. We do not filter out or delete any e-mail based on the spam score, although we do change the subject lines of messages that appear to be spam. This page describes how to filter your spam with Procmail on our systems.

You can also change SpamAssassin's configuration to affect things like how it calculates its spam scores or what score it uses to decide that a message is really spam. For more information on that, please see SpamAssassin Configuration.

Log in to a Linux Client

All of these changes must be done on one of the Department's Linux clients.

Create a .procmailrc File

We use procmail for mail filtering. It's controlled by a file in your home directory named .procmailrc (note the leading dot!)

Create that file. In most cases, that should be as easy as opening your preferred text editor, creating a new document, and saving it as .procmailrc in your home directory.

Put the following into your .procmailrc file:

MAILDIR=$HOME/Mail
LOGFILE=$MAILDIR/procmaillog
#LOGABSTRACT=all
#VERBOSE=yes

:0:
* ^X-Spam-Flag: YES
SPAM

This will take every message that SpamAssassin thinks is spam and put those messages into a mail folder named "SPAM" instead of your main inbox.

Save your .procmailrc file.

At a command prompt, run the following command to ensure that the file's permissions are appropriate:

chmod 600 ~/.procmailrc

Explanation

The first line, MAILDIR, sets the mail directory to be the Unix directory on our Grad/Research Net systems that you use to store your mail folders.

The LOGFILE can be set to keep a log of procmail activities for your account. You should set this; if you don't and there are problems delivering your mail, procmail will send a bounce email back to the person who sent the mail that had the problem.

LOGABSTRACT and VERBOSE will increase the information sent to the log file. They're useful for debugging your configuration. To activate them, just remove the "#" character at the beginning of their lines.

Note: Your log file will grow over time and may eventually use a lot of space. You can clear it out with this command, run from the command line of one of our Linux Clients: cat /dev/null >~/Mail/procmaillog

The section beginning with ":0:" files mail flagged as spam to a folder called SPAM. That folder will be stored in the directory defined by the first line in the file, MAILDIR.

The ":0:" line tells procmail that you're starting a new recipe. (More precisely, ":0" starts a new recipe. The final colon (":") tells procmail that this recipe will be storing the message in a file, so it needs to lock the file while writing to make sure that no other program tries to write to the file at the same time.)

In "* ^X-Spam-Flag: YES" the "*" at the beginning of the line tells procmail that you're giving it something to test each mail against The following regular expression tells procmail to match any message with an X-Spam-Flag: header whose value is YES.

The last line of the recipe, "SPAM" gives the name of mail folder (within the directory given by MAILDIR) where matching messages should be placed.

Check Your Spam Folder

Now, the most important step here is to wait a bit and see if **SPAM**-tagged mail continues to show up in your inbox. If you haven't seen any of it in bit, then go to your new spam folder. (It may depend on your mail program how you find that mail folder to view.) Do you see the SPAM-tagged messages there? If so, it's working. If not, contact CS IT Support, and we'll try to figure out why it isn't working for you.

VERY IMPORTANT: When should you check your spam folder? Often. Why? Two reasons. 1) There could be some false positives in your folder, mail tagged as SPAM because it looked like spam, but may actually not be spam. Example? You might be subscribed to a mailing list, and the list sends you mail that looks like spam. 2) You need to go through regularly and clear out (delete/remove) your spam folder of spam messages to conserve room on our home directory server disks. Some of the **SPAM**-tagged mail could be very big, especially if it contains attachments. This could fill up our disks rapidly. Please make sure you monitor your spam folder and keep it clean regularly.

Automatically Delete (Some) Spam

The score that SpamAssassin gives is an indication of how likely it thinks that a given email is spam. One common thing to do is to automatically delete messages with high scores (because they're extremely unlikely to be false positives) while keeping lower-scored (but still classified as spam) messages so you can evaluate them individually.

By default, we tag all messages with a score greater than 7 as spam. (This is very conservative. Many people use a threshold of 5 and some get good results with thresholds as low as 2.) If a message has a score of 15 or higher, it's almost certainly spam and can be discarded unread. Here's a procmail rule to accomplish that:

# Drop spam over score 15.
:0
* ^X-Spam-Status: Yes, score=\/[0-9-]+
* ? test $MATCH -gt 15
/dev/null

Warning: This will delete messages. Make sure you're okay with that before you use this rule.

Learn More about procmail

There are a number of man pages about procmail on our Linux clients. You can run man procmailex from a command prompt to see a number of useful examples of procmail use. man procmailrc describes the ins and outs of the .procmailrc file. (man procmail will tell you how to run procmail, but for the most part you shouldn't need to worry about that, since we run procmail for you automatically.)

If you're looking for more procmail examples, you can read through the extensive list at Timo's procmail tips and recipes.