Not Too Hot, Not Too Cold: The Bundled-SVM is Just Right!

The Support Vector Machine (SVM) typically outperforms other algorithms on text classification problems, but requires training time roughly quadratic in the number of training documents. In contrast, linear time algorithms like Naive Bayes have lower performance, but can easily handle huge training sets. In this paper, we describe a technique that creates a continuum of classifiers between the SVM and a Naive Bayes like algorithm. Included in that continuum is a classifier that approximates SVM performance with linear training time. Another classifier on this continuum can outperform the SVM, yielding a breakeven point that beats other published results on Reuters-21578. We give empirical and theoretical evidence that our hybrid approach successfully navigates the tradeoffs between speed and performance.

[1]  Jason D. M. Rennie,et al.  Improving Multiclass Text Classification with the Support Vector Machine , 2001 .

[2]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[3]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[4]  Prabhakar Raghavan,et al.  Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases , 1997, VLDB.

[6]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[7]  H. P. Dikshit,et al.  ADVANCES IN COMPUTATIONAL MATHEMATICS: NEW DELHI, INDIA: Proceedings of the Conference , 1994 .

[8]  Adam L. Berger,et al.  ERROR-CORRECTING OUTPUT CODING FOR TEXT CLASSIFICATION , 1999 .

[9]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[10]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[11]  Rayid Ghani,et al.  Using Error-Correcting Codes for Text Classification , 2000, ICML.

[12]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[13]  A. Karimi,et al.  Master‟s thesis , 2011 .

[14]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Jason D. M. Rennie Improving multi-class text classification with Naive Bayes , 2001 .

[18]  Lukasz Kurgan,et al.  Data Mining and Knowledge Discovery Data Mining and Knowledge Discovery , 2002 .

[19]  Pedro M. Domingos When and how to subsample: report on the KDD-2001 panel , 2002, SKDD.

[20]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[21]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.