Click here to go to TopsailConsulting.com!

It's All in the Numbers
Year 2003 - Week 22

It's All in the Numbers

This week, we're just going to bring you an answer to a problem everybody faces...

What do AOL, the Virginia legislature, and the Reverend Thomas Bayes (1701-1761) have in common? They all have recently entered the fight against spam, or UBE (Unsolicited Bulk Email). The difference, so far, is in the success rates. When AOL sued spammers last month and Virginia made spamming a crime, did you notice a drop in the spam you received? Really? Me, neither.

Traditionally, spam is fought using lists of rules, usually called "blacklists" (for people who should never get through) and "whitelists" (for people who should always get through). Unfortunately, many spammers change their addresses (and domains) every few minutes; we can't just block every email from anybody@yahoo. com!

Enter Rev. Bayes, an enigmatic figure who just may have offered one of the biggest advances ever in Artificial Intelligence, more than 200 years before the dawn of computers. Simply put, Bayes' Rule is a mathematical way to modify one's beliefs, based on new evidence.

For example, imagine that a newly-licensed driver goes out and has a wreck, her first time behind the wheel. Based solely on her past experience, there is a 100% probability that she will wreck any car she drives! Suppose, however, that our driver is the experimental type. She knows wrecks aren't quite that likely, so she conducts an experiment; she put white marbles (representing safe car trips) and black marbles (representing wrecks) into a bag. Every time she has a wreck, she adds a black marble, and every time she has a safe drive, she adds a white one.

Pretty soon, the white marbles would outnumber the black ones by about 999 to 1, and she would realize that her odds of wrecking a car on a given trip are actually closer to 0% than to 100%.

What does this have to do with spam? Well, suppose I get a spam with the word "debt" in the subject. Of the next 99 emails with "debt" in the subject line, 98 are spam and 1 is a real email from a friend. I now know that emails with subjects containing the word "debt" are spam, 99% of the time. If I build a list of thousands of words (my real lists contain 14,000 good words and 27,000 spam words) and combine their probabilities – with a little help from Rev. Bayes – I can filter my email with far better than 99% accuracy.

There are a number of free (and safe) tools to do just this. We recommend the K9 filter from keir.net; it's just a little tricky to set up, but it's easy to use and very effective. There are other filters out there; this is just one we've used.

Hope this helps you get rid of the clutter and get back to work!

Sincerely,

Ed Cottrell, Editor and President





Related Links:
Topsail Consulting
 
Paul Graham's Spam Page
 
Suggest a Topic
If you have a topic that you would like to see discussed in a future edition of Topsail Topics, please email us.


Tip of the week:
Do you know what you're selling? Not what you want to sell, but what you do sell – the difference could make or break your business!
 
CONTACT US FOR FURTHER INFORMATION
If you would like to subscribe to this newsletter, please click here.
 
Copyright 2003
All Opinions expressed in this article are solely those of Topsail Consulting, Inc. and should not be construed as an endorsement or disparagement of any product or products. This newsletter may be freely redistributed if copied in its entirety. Partial reprints or other uses require permission from Topsail Consulting, Inc.