Constructing informative prior distributions from domain knowledge in text classification

Supervised learning approaches to text classification are in practice often required to work with small and unsystematically collected training sets. The alternative to supervised learning is usually viewed to be building classifiers by hand, using a domain expert's understanding of which features of the text are related to the class of interest. This is expensive, requires a degree of sophistication about linguistics and classification, and makes it difficult to use combinations of weak predictors. We propose instead combining domain knowledge with training examples in a Bayesian framework. Domain knowledge is used to specify a prior distribution for the parameters of a logistic regression model, and labeled training data is used to produce a posterior distribution, whose mode we take as the final classifier. We show on three text categorization data sets that this approach can rescue what would otherwise be disastrously bad training situations, producing much more effective classifiers.

[1]  Dmitriy Fradkin,et al.  DIMACS AT THE TREC 2004 GENOMICS TRACK (DRAFT) , 2004 .

[2]  Evgeniy Gabrilovich,et al.  Feature Generation for Text Categorization Using World Knowledge , 2005, IJCAI.

[3]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[4]  Robert E. Schapire,et al.  Incorporating Prior Knowledge into Boosting , 2002, ICML.

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Rohini K. Srihari,et al.  Incorporating prior knowledge with weighted margin support vector machines , 2004, KDD.

[7]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[8]  Daniel Kudenko,et al.  Transferring and Retraining Learned Information Filters , 1997, AAAI/IAAI.

[9]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[10]  Norbert Fuhr,et al.  Combining model-oriented and description-oriented approaches for probabilistic indexing , 1991, SIGIR '91.

[11]  Hema Raghavan,et al.  InterActive Feature Selection , 2005, IJCAI.

[12]  Yiming Yang,et al.  A Loss Function Analysis for Classification Methods in Text Categorization , 2003, ICML.

[13]  Alex Acero,et al.  Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[14]  Ellen Riloff Bootstrapping for text learning tasks , 1999 .

[15]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[16]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[17]  Yiming Yang,et al.  Robustness of regularized linear classification methods in text categorization , 2003, SIGIR.

[18]  D. Madigan,et al.  Eliciting prior information to enhance the predictive performance of Bayesian graphical models , 1995 .

[19]  Tony A. Meyer,et al.  SpamBayes: Effective open-source, Bayesian based, email classification system , 2004, CEAS.

[20]  Hwee Tou Ng,et al.  Bayesian online classifiers for text classification and filtering , 2002, SIGIR '02.

[21]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[22]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[23]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[24]  Paul B. Kantor,et al.  DIMACS at the TREC 2004 Genomics Track , 2004, TREC.

[25]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[26]  Philip S. Yu,et al.  Text Classification by Labeling Words , 2004, AAAI.

[27]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[28]  James O. Berger,et al.  Bayesian and Frequentist Approaches to Parametric Predictive Inference , 1999 .

[29]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[30]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[31]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[32]  A. Rukhin Bayes and Empirical Bayes Methods for Data Analysis , 1997 .

[33]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[34]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.