A sequential algorithm for training text classifiers

The ability to cheaply train text classifiers is critical to their use in information retrieval, content analysis, natural language processing, and other tasks involving data which is partly or fully textual. An algorithm for sequential sampling during machine learning of statistical classifiers was developed and tested on a newswire text categorization task. This method, which we call uncertainty sampling, reduced by as much as 500-fold the amount of training data that would have to be manually classified to achieve a given level of effectiveness.

[1]  M. E. Maron,et al.  Automatic Indexing: An Experimental Inquiry , 1961, JACM.

[2]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[3]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[7]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[8]  Stephen Robertson,et al.  Statistical problems in the application of probabilistic models to information retrieval , 1982 .

[9]  Abraham Bookstein,et al.  Information retrieval: A sequential learning process , 1983, J. Am. Soc. Inf. Sci..

[10]  Norbert Fuhr,et al.  The automatic indexing system AIR/PHYS - from research to applications , 1988, SIGIR '88.

[11]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[12]  Norbert Fuhr,et al.  Models for retrieval with probabilistic indexing , 1989, Inf. Process. Manag..

[13]  Paul E. Utgoff,et al.  Improved Training Via Incremental Learning , 1989, ML.

[14]  Kenneth Ward Church,et al.  Poor Estimates of Context are Worse than None , 1990, HLT.

[15]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[16]  William S. Cooper,et al.  Some inconsistencies and misnomers in probabilistic information retrieval , 1991, SIGIR '91.

[17]  Jenq-Neng Hwang,et al.  Query-based learning applied to partially trained multilayer perceptrons , 1991, IEEE Trans. Neural Networks.

[18]  Norbert Fuhr,et al.  Combining model-oriented and description-oriented approaches for probabilistic indexing , 1991, SIGIR '91.

[19]  Fredric C. Gey,et al.  Probabilistic retrieval based on staged logistic regression , 1992, SIGIR '92.

[20]  H. Sebastian Seung,et al.  Information, Prediction, and Query by Committee , 1992, NIPS.

[21]  Jenq-Neng Hwang,et al.  Attentional focus training by boundary region data selection , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[22]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[23]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[24]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[25]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[26]  Philip J. Hayes,et al.  Intelligent high-volume text processing using shallow, domain-specific techniques , 1992 .

[27]  Mark Plutowski,et al.  Selecting concise training sets from clean data , 1993, IEEE Trans. Neural Networks.

[28]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.