ifile: An Application of Machine Learning to E-Mail Filtering

The rise of the World Wide Web and the ever-increasing amounts of machine-readable text has caused text classification to become a important aspect of machine learning. One specific application that has the potential to affect almost every user of the Internet is e-mail filtering. The WorldTalk Corporation estimates that over 60 million business people use e-mail [6]. Many more use e-mail purely on a personal basis and the pool of e-mail users is growing daily. And yet, automated techniques for learning to filter e-mail have yet to significantly affect the e-mail market. Here, I attack problems that plague practical e-mail filtering and suggest solutions that will bring us closer to the acceptance of using automated classification techniques to filter personal e-mail. I also present a filtering system, ifile, that is both effective and efficient, and which has been adapted to a popular e-mail client. Results are presented from a number of experiments and show that a system such as ifile could become a useful and valuable part of any e-mail client.

[1]  Jonathan Helfman,et al.  Ishmail: Immediate Identification of Important Information , 1995 .

[2]  Jeffrey O. Kephart,et al.  Incremental Learning in SwiftFile , 2000, ICML.

[3]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .

[4]  Jeffrey O. Kephart,et al.  MailCat: an intelligent assistant for organizing e-mail , 1999, AGENTS '99.

[5]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[6]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[7]  Yiming Yang,et al.  Improving text categorization methods for event tracking , 2000, SIGIR '00.

[8]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[9]  Terry R. Payne Learning Email Filtering Rules with Magi A Mail Agent Interface , 1994 .

[10]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[11]  Adam L. Berger,et al.  ERROR-CORRECTING OUTPUT CODING FOR TEXT CLASSIFICATION , 1999 .

[12]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[13]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[16]  David D. Lewis,et al.  Threading Electronic Mail - A Preliminary Study , 1997, Inf. Process. Manag..

[17]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[18]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.