Filtering Spam with Procmail

Revision as of 13:05, 11 May 2011 by Steve410 (talk | contribs)

The CS Department uses Spamassassin on the Grad/Research Net only to track spam e-mail and to tag the Subject line of that spam with with the text, ** SPAM **

We do not filter out or delete any e-mail tagged as ** SPAM **. We merely tag it as such, and you can use your own mail client or procmail to filter/delete/move that mail.

(The exception is mail with viruses. Our mail server's virus checker will remove mail that contains any virus that it can detect.)


Customizing Your Spamassassin preferences

The mail server does a global spamassassin sweep of incoming CS mail based on various criteria. You can define additional criteria or overriding options by providing personal spamassassin options.

To do this:

1) Create a directory called .spamassassin (don't forget the dot) under your home directory..


2) In your .spamassassin directory, create a file called user_prefs.

user_prefs is the configuration file to put in your spamassassin customizations.

For example, to change your Subject tag to something different, you could add the line to the user_prefs file:

subject_tag **THIS IS S-P-A-M**

And then, mail identified as spam will have this new tag in the Subject line instead of the default, ** SPAM **.

Other options can be found at: http://www.spamassassin.org/doc/Mail_SpamAssassin_Conf.html (NOTE: Not all of the options listed on that page are available to users, and we're using a older revision of spamassassin, so some options listed on that page may not be available for our version.)

You may find you want to see more header info, and the link above can supply you some info in doing so.

One common option to configure is the number of "required hits". When spamassassin does it's tests, it provides your incoming message with a hit value (think of it as a point value). The higher number of hits, the most likely the message is spam. The lower the hits, the most likely the mail is ham (ham is mail that is not spam). In the CS dept, mail that contains 7.0 or more hits is tagged as spam. You can make that number lower, say, 5.0 or 6.0, meaning if spamassassin indicates the mail message has only 5 or 6 hits in the spamassassin scanner, then the message will be tagged as spam. This increases the risk of having false positives (non-spam mail getting tagged as spam), but it also ensures that more of your spam mail will get tagged as spam. For most users, you do not have this set the "required hits" option, since the global configuration sets it to a 7.0 threshold. If you want to change that, change it in your user_prefs file with something like:

required_hits 5.8


Remember, at this point, you're not deleting the spam mail. You're only tagging the Subject line of the spam mail. If you decide not to create your own user_prefs, spammassassi will use the mail server's defaults to tag spam mail as being spam.


Configuring Your Account to Filter ** SPAM **-tagged mail with procmail.

Procmail is a system that allows you to manipulate mail messages as they arrive. There is a global (server) procmail configuration, but you can also create your own configuration for your account. There are many options for procmail, but you can find useful tips and recipes for procmail at: http://lipas.uwasa.fi/~ts/info/proctips.html

As an easy example, let's take a user named joe. joe wants to file mail tagged as ** SPAM ** into a seperate spam folder for later review.

create a file called .procmailrc (don't forget the dot) in your home directory..

Edit the file to look something like this:

MAILDIR=/users/joe/Mail
DEFAULT=/var/mail/joe
LOGFILE=$MAILDIR/procmaillog
LOCKEXT=.lock

:0:
* ^Subject.*\*\*SPAM\*\*.*
SPAMFOLDER

The first line, MAILDIR, sets the mail directory to be the Unix directory on our Grad/Research Net systems that you use to store your mail folders.

The DEFAULT line is standard; it is the location of your inbox.

The LOGFILE can be set to keep a log of procmail activities for your account.

Note: Please clean out the your procmaillog file occasionally with the following command on a Grad/Research Net :

  cat /dev/null > /users/joe/Mail/procmaillog

(of course, change "joe" to your account name.)

LOCKEXT line is needed as is.

Now, the section beginning with :0: does the file of mail with the ** SPAM ** tag in its subject to a folder called SPAMFOLDER. That folder will be stored in the directory defined by the first line in the file, MAILDIR. the * ^Subject line says, "if the Subject contains ** SPAM **, then... file it into SPAMFOLDER.

Notice the backslashes. In some forms of Unix, you need to make sure you have \ marks before certain instances of special characters like *. We call that "escaping the characting". In this case, some of the * characters in that line are "escaped", and some are not. Just follow the example.

Save your .procmailrc file.

At a unix command prompt, run the following command:

chmod 600 .procmailrc

At this point, we can tell the mail system to start filing our ** SPAM **-tagged mail.

Activate the .procmailrc Configuration

To do this, simply create/edit another file (in your top-level home directory) called .forward (again, remember the dot)

In .forward, place the following text:

"| /usr/local/scripts/spamc |exec /usr/local/bin/procmail"

(Include the " " marks in the above line.)

And save the file. Make sure the permissions are correct on the file. If you're not sure, just set the permissions at a unix prompt with:

chmod 600 .forward

That should do it. **SPAM**-tagged mail should start to show up in the new mail folder.

Now, the most important step here is to wait a bit and see if **SPAM**-tagged mail continues to show up in your inbox. If you haven't seen any of it in bit, then go to your new spam folder. (It may depend on your mail program how you find that mail folder to view.) Do you see the SPAM-tagged messages there? If so, it's working. If not, contact support@cs.jhu.edu, and we'll try to figure out why it isn't working for you.

VERY IMPORTANT: When should you check your spam folder? Often. Why? Two reasons. 1) There could be some false positives in your folder, mail tagged as SPAM because it looked like spam, but may actually not be spam. Example? You might be subscribed to a mailing list, and the list sends you mail that looks like spam. 2) You need to go through *regularly* and clear out (delete/remove) your spam folder of spam messages to conserve room on our home directory server disks. Some of the **SPAM**-tagged mail could be very big, especially if it contains attachments. This could fill up our disks rapidly. Please make sure you monitor your spam folder and keep it clean regularly.