Keeping Spam in the CanWife: I don't like spam! Man: Sshh, dear, don't cause a fuss. I'll have your spam. I love it. I'm having spam, spam, spam, spam, spam, spam, spam, baked beans, spam, spam, spam, and spam! Vikings: Spam spam spam spam. Lovely spam! Wonderful spam! Monty Python “The Spam Sketch” If you have an e-mail account, then you probably agree with the wife. The problem is that interspersed with all that nasty spam are juicy chunks of real e-mails from people we want to hear from. How do you sort the good from the bad? Only you know what kinds of e-mails you want to receive, andespecially in a University environmentno one but you should be messing with your ability to decide what you read and what is filtered out. So the University’s e-mail server doesn’t pre-filter your e-mail. Virus-laden e-mails are quarantined, but aside from that, every e-mail that explains that you can win this, lengthen that, lose this, buy that, et cetera, ad nauseumgets through. However, the e-mail server does add something potentially very useful to every e-mail that passes through it. It uses a piece of software called SpamAssassin to scan the contents of each message and assign each one a “spam score.” The spam score for each message is included as a header, along with the normal headers like “To,” “From,” “Date,” and “Subject.” So big deal, you’re thinking. What good does that do? To understand, you need to know about filters. Most e-mail clients (like Outlook, Netscape Mail and Apple Mail) have the ability to filter out spam by examining the contents of each message and sending to the trash (or a special folder) anything it thinks is spam. For example, a simple filter might look for the word “Viagra” in the body of a message. If it finds the word, then it sends the message straight into a Junk folder (or straight to Trash if you want). You can create more complex filters to look for words and phrases in different parts of the e-mail so that it is less likely to filter out a message you want to keep. For example, you might ask a filter to send a message to Trash if: 1) the message contains the word “Viagra” AND 2) the sender’s address does NOT end with “webster.edu” AND 3) the sender is NOT in your address book(Be careful if you tend to get a lot of work mail from outside Webster). This method works fairly well, but it has limitations and requires time to set up filters for all the words you want blocked. One significant limitation is that spammers have started to vary the spelling of some of the words you are likely to filter. “Viagra,” “Vîagra,” “Vi a gra,” “Vi ågrâ,” etc. are all different variations of the same word and you have to filter out each word separately. Another kind of filtering system, called Bayesian filtering, uses statistical analysis to assess the probability that a particular message is spam. You spend a few days training the system by telling it which messages are spam and which aren’t. This allows the system to set the probabilities based on what you consider to be spam. The nice thing about this system is that, when properly trained, virtually all of the spam is filtered out and you see very few false positives. And you don’t want false positives. False positives occur when a message you’d like to receive is filtered out. A long, lost high school buddy has finally tracked you down andzip! Oops! Sent to the Junk folder. Your boss sends you that information that you’ve been waiting on for that important project andswoosh! That puppy is toast. SpamAssassin uses a system that looks at a message’s content and compares attributes of that message with that of typical spam. Like Bayesian filtering, it analyzes each message as a whole. But, unlike the Bayesian system, you don’t get to train it to your notion of what is spam. SpamAssassin looks at a set of attributes and every time it finds that attribute in a message it adds it to a “spam score.” It might be an attribute that is spam-like or it might be one that is not-spam-like. For example, SpamAssassin might see “click here” in the body of message and say, “That’s very spam-like. I’ll give that a ‘3.0.’” And then it’ll see that the subject line is in all caps. “That’s kind of spam-like, that’ll be a 1.1.” But at the same time it’ll see that the message is a reply to a previous e-mail message (maybe it has the line “on [date, person’s name] wrote:” followed by some quoted lines) and say, “That’s not spam-like. I’ll give that a 3.8.” It adds up to 0.3. That’s a pretty small number. Probably not spam. Glad we didn’t filter it out. SpamAssassin puts that spam score into a header called “X-Spam-Score.” It will give the score like this: 8.7 (********) CLICK_BELOW, SUBJ_REMOVE, MAILTO_WITH_SUBJ, MAILTO_WITH_SUBJ_REMOVE, MAILTO_TO_REMOVE, CLICK_HERE_LINK, MAILTO_LINK See all those asterisks? There are eight of them. If the spam score were 9.3, there would be nine of them. The more asterisks there are, the more likely it is that the message is spam. You also get a few codes that indicate what constitutes the spam score. They can be helpful, but you have to know what they all mean, and there are hundreds of different codes. To take advantage of SpamAssassin, you’ll need to use your e-mail client’s built in filtering system, have it look at the X-Spam-Score header, and set it to filter the message if it has more asterisks than you want. How many do you want? That’s subjective. Look at the scores for the spam you get and compare those scores with non-spam. Pick a number and tweak it if too much spam is getting through or if legitimate e-mail is being filtered out. Webster University faculty and staff use a wide variety of e-mail clients, so it’s not practical in this article to go into detail about how to implement these filtering systems. You can call your friendly neighborhood help desk to arrange to have someone get you set up the way you want. But you can also explore your e-mail client and discover how to access the filters. If you do this, you’ll also discover the filters can do all sorts of other things that you might find useful. Playing with your computer to see how it works is mostly a good thing. You can read more about spam at Webster’s technology page at: http://www.webster.edu/depts/acs/femail.html#getting A great article on setting up filters is at: http://www.webster.edu/depts/acs/kb/filter.html An article on setting up SpamAssassin for Outlook is at: http://www.thompsonic.com/util/antispam/index.html. The SpamAssassin website is at: http://www.webster.edu/depts/acs/femail.html#getting, but it’s a bit technical. A definite article on Bayesian filters is at: http://www.paulgraham.com/spam.html A GOOD E-MAIL FILTERING STRATEGY You can set up your filters any way you want. Here’s one good way: First, make sure that if the sender’s address ends in “webster.edu” the message gets sent to your Inbox. Second, make sure that if the sender’s address is in your Address Book, the message also gets sent to your Inbox. Third, if your e-mail client has one, let the Bayesian filter do its thing. If it has been trained well, you can filter out as much as 98 percent of your spam. Send messages it thinks is spam to the trash, or, if you’re worried about accidentally trashing a good message, you can set up a Spam folder and send it there. You can review all the messages in that folder quickly and then trash them. If you don’t have a Bayesian filter, use the SpamAssassin filter to sort spam out. You’ll have to tinker with the settings (the number of asterisks you look for). Send filtered messages to a Spam folder so you can review them before you trash them. Finally, add specific filters for whatever spam gets through. Look for repeating “From” addresses and Subject lines, and for words inside the message that you are sure will never appear in a legitimate e-mail message. Send them to your spam folder. |