Binarization Approaches to Email Categorization

Email categorization becomes very popular today in personal information management. However, most n-way classification methods suffer from feature unevenness problem, namely, features learned from training samples distribute unevenly in various folders. We argue that the binarization approaches can handle this problem effectively. In this paper, three binarization techniques are implemented, i.e. one-against-rest, one-against-one and some-against-rest, using two assembling techniques, i.e. round robin and elimination. Experiments on email categorization prove that significant improvement has been achieved in these binarization approaches over an n-way baseline classifier.

[1]  Andrew McCallum,et al.  Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora , 2005 .

[2]  Friedhelm Schwenker,et al.  Hierarchical support vector machines for multi-class pattern recognition , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[3]  Christopher M. Bishop,et al.  Advances in Neural Information Processing Systems 8 (NIPS 1995) , 1991 .

[4]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[5]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[6]  Günther Palm,et al.  Tree-Structured Support Vector Machines for Multi-class Pattern Recognition , 2001, Multiple Classifier Systems.

[7]  Danyel Fisher,et al.  Studies of Automated Collection of Email Records , 2002 .

[8]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[9]  Elio Masciari,et al.  A framework for adaptive mail classification , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[10]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[11]  Yorick Wilks,et al.  FASiL Adaptive Email Categorization System , 2005, CICLing.

[12]  Adam L. Berger,et al.  ERROR-CORRECTING OUTPUT CODING FOR TEXT CLASSIFICATION , 1999 .

[13]  Dino Pedreschi,et al.  Machine Learning: ECML 2004 , 2004, Lecture Notes in Computer Science.

[14]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[15]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[16]  Elio Masciari,et al.  Towards An Adaptive Mail Classifier , 2002 .