Context-sensitive learning methods for text categorization

Two recently implemented machine-learning algorithms, RIPPERand sleeping-experts for phrases, are evaluated on a number of large text categorization problems. These algorithms both construct classifiers that allow the “context” of a word w to affect how (or even whether) the presence or absence of w will contribute to a classification. However, RIPPER and sleeping-experts differ radically in many other respects: differences include different notions as to what constitutes a context, different ways of combining contexts to construct a classifier, different methods to search for a combination of contexts, and different criteria as to what contexts should be included in such a combination. In spite of these differences, both RIPPER and sleeping-experts perform extremely well across a wide variety of categorization problems, generally outperforming previously applied learning methods. We view this result as a confirmation of the usefulness of classifiers that represent contextual information.

[1]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[2]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[3]  Avrim Blum Learning boolean functions in an infinite attribute space , 1990, STOC '90.

[4]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[5]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[6]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[7]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[8]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[9]  William W. Cohen Efficient Pruning Methods for Separate-and-Conquer Rule Learning Systems , 1993, IJCAI.

[10]  Michael J. Pazzani,et al.  HYDRA: A Noise-tolerant Relational Concept Learning Algorithm , 1993, IJCAI.

[11]  Sholom M. Weiss,et al.  Towards language independent automated learning of text categorization models , 1994, SIGIR '94.

[12]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[13]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[14]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[15]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[16]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[17]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[18]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[19]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[20]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[21]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[22]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[23]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[24]  William W. Cohen Text Categorization and Relational Learning , 1995, ICML.

[25]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[26]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[27]  J. R. Quinlan,et al.  MDL and Categorical Theories (Continued) , 1995, ICML.

[28]  Michael J. Pazzani,et al.  Learning from hotlists and coldlists: towards a WWW information filtering and seeking agent , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[29]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[30]  William W. Cohen Learning Trees and Rules with Set-Valued Features , 1996, AAAI/IAAI, Vol. 1.

[31]  Hinrich Schütze,et al.  Method combination for document filtering , 1996, SIGIR '96.

[32]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[33]  Yoram Singer,et al.  Learning to Query the Web , 1996 .

[34]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[35]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[36]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[37]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[38]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.