Electrolite: Deploying the Lazy Web:

February 9, 2003

Deploying the Lazy Web: James Gleick’s New York Times Magazine overview of the ongoing spam crisis praises SpamAssassin, a tool for filtering spam on Unix mail systems based on a variety of tests. Which reminds me of a question I’ve meant to pose to my many readers who are more technically competent than I am.

My ISP offers SpamAssassin as an option, and I’ve enabled it; I’ve even figured out how to “whitelist” certain addresses from which I want to always receive mail, even if the mail nominally flunks SpamAssassin’s various tests.

My problem has to do with the emailed copies I receive of posts to Electrolite. For whatever reason, SpamAssassin zaps some of these. I can’t “whitelist” them by originating address, because they come from many different addresses. And while they all have the string “[Electrolite] New Comment Posted” in their “Subject:” line, I can’t divine, in the thicket of SpamAssassin’s (and procmail’s) Unix-obtuse documentation, how to (in essence) “whitelist” something by subject line.

But you know something? I’ll bet someone reading this knows the answer. [09:14 AM]

Welcome to Electrolite's comments section.
Hard-Hitting Moderator: Teresa Nielsen Hayden.

Comments on Deploying the Lazy Web::

John H-R ::: (view all by) ::: February 09, 2003, 10:05 AM:

Hi. Random unix-geek here.

I'm not entirely sure how yon ISP manages local tweaks, but something like...

header ELECTROLITE Subject =~ /[Electrolite]/
describe ELECTROLITE Electrolite posting
score ELECTROLITE -20

... In whatever passes for your local.cf ought to do the business. Unless I've just written a paragraph of fluent Martian...

JH-R

Patrick Nielsen Hayden ::: (view all by) ::: February 09, 2003, 11:44 AM:

Nope, not Martian at all.

Panix is an old-fashioned shell-account ISP. The user-specific conf file for SpamAssassin would appear to be at ~/.spamassassin/user_prefs.

Your three lines certainly look like the kind of syntax SpamAssassin's configuration files use, so I've simply appended them to my version of this file.

I now realize that it's the first line of your three that answers my real question: how to create a new SpamAssassin criterion. The scoring and address-specific "whitelisting" are obvious; this one isn't. Thanks! We'll see if it works.

Mike Kozlowski ::: (view all by) ::: February 09, 2003, 01:54 PM:

Panix-specific info: If you're using the rc.spamassassin, it's hitting the spamd daemon, so (if I understand correctly), it'll ignore user rules.

Because the test is so easy, I'd probably just put it into procmail: create an rc.whitelist (or whatever), and put it into your procmailrc ahead of SpamAssassin , and then add a test for that subject line.

:0:
^Subject:.*[Electrolite] New Comment Posted
$MAILDIR

Or something like that. I'm not really a procmail guru, and haven't tested that. But the idea is that it'll get whipped through to your inbox without even hitting SA.

Graydon ::: (view all by) ::: February 09, 2003, 06:42 PM:

spamd results in the stuff that has passed through spamassin (and undergone header and possibly message body markup) landing back in your spool file, traditionally /var/spool/mail/.

Now, panix may have this set up differently, but I'd expect that what happens is that stuff goes through spamd, and there's a default procmailrc file for each user that takes the stuff that has been marked 'spam' and hurls it into the outer darkness.

At that point, one looks at .procmailrc and puts the 'no, no, I want it' rule *ahead* of the 'hurl into outer darkness' rule. If the rule to call spamd is in your .procmailrc, you put it ahead of that, so it won't get marked up with all the reasons spamassasin thinks it's spam.

The upside of a whitelist is that the message won't get scribbled on, if spamassisin is being called as part of panix's mail transport. (I have no idea; I do know that there's lots of support for that out there.)

Erik V. Olson ::: (view all by) ::: February 09, 2003, 11:22 PM:

What Graydon said. You can attempt to write your own SpamAssassin rules, but it's not trivial.

Mike's procmailrc invocation is correct, except that the spaces in the ^Subject may cause problems, depending on the version of procmail. (If so, quote or just match ^Subject:.*[Electrolite])

I'd check to make sure $MAILDIR is set to the correct thing, but that's just me.

Note that if Panix is calling spamc or spamassassin in /etc/procmail, that'll get run *before* your rules in .procmailrc, and you'll end up with the spam flags in the message. If so, change the last line from $MAILDIR to

| spamassassin -d > $MAILDIR

(Also, if you are calling spamassassin directly, see if Panix is running spamd, if so, make sure your call is to /path/to/spamc, not /path/to/spamassassin)

Avram ::: (view all by) ::: February 10, 2003, 12:12 AM:

Erik, was "/path/to/spamc" a typo? Panix seems to have boeht a spamc and a spamd.

Nix ::: (view all by) ::: February 10, 2003, 06:34 AM:

No, /path/to/spamc is not a typo. spamc is the program that users run from procmailrc; it's a tiny and simple-minded client that sends the mail to the (heavy-duty) spamd daemon, which does the actual filtering; only one spamd normally runs per site. (The reason for all this is that the good old "spamassassin" filter takes ages to start up and chews the CPU while it does so; by using "spamd", the sysadmin only pays that cost once, as the system starts.)

The problem is that a malicious rule can cause arbitrary code to be executed; when using the spamc/spamd combo, your mail is being scanned by a process running as a different user, so defining rules on a user-by-user basis is by default turned off there. (The sysadmin can turn them on again, but I hope Panix haven't done that!)

You can replace "spamc" with "spamassassin", after which you *can* define user-specific rules --- but as this chews up lots more machine time, Panix might not like that. (If you replace "spamassasssin" with "nice spamassassin", they might be happier...)

Hm. I fear this wasn't very clear.

Kathryn Cramer ::: (view all by) ::: February 10, 2003, 09:02 AM:

A Panix user's stupid question: How do I turn on Spam Assassin in the first place? (My incoming email is over 95% spam at this point.)

Patrick Nielsen Hayden ::: (view all by) ::: February 10, 2003, 09:33 AM:

The answer to Kathryn's question, and the answer to various people's speculations, can be found in the same place: Panix's SpamAssassin help page.

Everyone: Notwithstanding discussions of how best to configure procmail, SA, etc., John H-R's tip in the first message seems to have done the trick. Feel free to continue, though.

Kathryn, give me a call at work if you need any help.

Bob Webber ::: (view all by) ::: February 10, 2003, 11:05 AM:

I'll "fourth" Mike's, Graydon's and Erik's suggestions: there are advantages in efficiency and reliability to doing the whitelisting purely in procmail. Then the "always deliver" mail gets put in your mailbox with the least possible consumption of computrons, lowest moving-part count, and smallest number of ways for things to go wrong.

Though of course if it works well enough the way you've done it already, the maximum efficiency in terms of your time is probably to leave well enough alone.

Erik V. Olson ::: (view all by) ::: February 10, 2003, 01:06 PM:

Nix: The Spamassassin code is reasonable efficent, except for one screaming flaw -- it's written in Perl. It's that loading of the Perl interprerter that really slams the system.

I also not that Bob is also correct. With the spamassassin whitelist, you start procmail, then spamc, which exits quickly (since the whitelist checks happen first) then procmail delivers. Doing it soley in procmail means you're starting, I think, three less processes.

To easily whitelist (and killfile!), add these to the start of your .procmailrc

#Save the Good
:0:
* ? /usr/bin/fgrep -i -f $HOME/.procmail/whitelist
$MAILDIR

#Death To Idiots
:0
* ? /usr/bin/fgrep -i -f $HOME/.procmail/killfile
/dev/null

whitelist and killfile are simple lists of email addresses. I used fgrep as the fastest and simplest way to find out if a header matched, and return a result code that procmail could grok. Note, also, I don't bother locking when I'm writing to /dev/null.

There is a reason to use Spamassassin's whitelists -- Solicited Commerical Email. They look just like spam, but I want the stuff I've asked for from things like airlines and hotels. A simple...

spamassassin --add-addr-to-whitelist hilton.com

means I get all the stuff from hilton, no matter which username or machine they used.

I've aliased "whitelist" to the above command, and "blacklist" to the obvious reverse, to handle domain-nuking. The personal killfile is saved for personal idiots. :)

Kate Nepveu ::: (view all by) ::: February 11, 2003, 10:22 AM:

For anyone who's reading this baffled by Unix-y things, as I am, can I recommend a freeware program I just started using and really like--Mailwasher ( http://www.mailwasher.net/ )? Particularly good for people on dialups, as it downloads just enough of each message to categorize it for you.

(Windows, POP3.)

Steve Gould ::: (view all by) ::: February 17, 2003, 02:53 PM:

For people who use Eudora and don't have an ISP that will filter spam for them, there is a lovely Eudora plugin called Spamnix which uses all the rules from Spam Assasin to filter email. It's shareware, not free, but you can run it in trial mode to see if you like it.

Unfortunately, it doesn't filter the mail until it is downloaded so those of you with low bandwidth have to chug through all the junk before it's killed.

http://www.spamnix.com/