A Weighted Maximum Entropy Language Model for Text Classification

The Maximum entropy (ME) approach has been extensively used for various natural language processing tasks, such as language modeling, part-of-speech tagging, text segmentation and text classification. Previous work in text classification has been done using maximum entropy modeling with binary-valued features or counts of feature words. In this work, we present a method to apply Maximum Entropy modeling for text classification in a different way it has been used so far, using weights for both to select the features of the model and to emphasize the importance of each one of them in the classification task. Using the X square test to assess the contribution of each candidate feature from the obtained X square values we rank the features and the most prevalent of them, those which are ranked with the higher X square scores, they are used as the selected features of the model. Instead of using Maximum Entropy modeling in the classical way, we use the X square values to weight the features of the model and give thus a different importance to each one of them. The method has been evaluated on Reuters-21578 dataset for test classification tasks, giving very promising results and performing comparable to some of the “state of the art” systems in the classification field.

[1]  Adam Berger,et al.  The Improved Iterative Scaling Algorithm A Gentle Introduction , 2003 .

[2]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[3]  Takenobu Tokunaga,et al.  Cluster-based text categorization: a comparison of category search strategies , 1995, SIGIR '95.

[4]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[5]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[6]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[7]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[8]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[9]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[10]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[11]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[16]  Padmini Srinivasan,et al.  Hierarchical neural networks for text categorization , 1999, SIGIR 1999.

[17]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[18]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Patrick Gallinari,et al.  HMM-based passage models for document classification and ranking , 2001 .

[20]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[21]  Stanley F. Chen,et al.  A Gaussian Prior for Smoothing Maximum Entropy Models , 1999 .

[22]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[23]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[24]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[25]  Saul B. Gelfand,et al.  Classification trees with neural network feature extraction , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[27]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[28]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[29]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[30]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[31]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[32]  Tat-Seng Chua,et al.  Building Semantic Perceptron Net for Topic Spotting , 2001, ACL.

[33]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[34]  Christopher M. Bishop,et al.  Classification and regression , 1997 .