BoosTexter: A Boosting-based System for Text Categorization

This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks. We present results comparing the performance of BoosTexter and a number of other text-categorization algorithms on a variety of tasks. We conclude by describing the application of our system to automatic call-type identification from unconstrained spoken customer responses.

[1]  Bill Broyles Notes , 1907, The Classical Review.

[2]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[3]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[4]  B. J. Field TOWARDS AUTOMATIC INDEXING: AUTOMATIC ASSIGNMENT OF CONTROLLED‐LANGUAGE INDEXING AND CLASSIFICATION FROM FREE INDEXING , 1975 .

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Norbert Fuhr,et al.  The automatic indexing system AIR/PHYS - from research to applications , 1988, SIGIR '88.

[7]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[8]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[9]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[10]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[11]  A. Jefferson Offutt,et al.  An Empirical Evaluation , 1994 .

[12]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[13]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[14]  Norbert Fuhr,et al.  Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions , 1994, TOIS.

[15]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[16]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[17]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[18]  David D. Lewis,et al.  Text categorization of low quality images , 1995 .

[19]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[21]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[22]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[23]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[24]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[25]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[26]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[27]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[28]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[29]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[30]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[31]  Andrej Ljolje,et al.  A spoken language system for automated call routing , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Giuseppe Riccardi,et al.  Automatic acquisition of salient grammar fragments for call-type classification , 1997, EUROSPEECH.

[33]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[34]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[35]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[36]  L. Breiman Arcing Classifiers , 1998 .

[37]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[38]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[39]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[40]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[41]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.