Terms-based discriminative information space for robust text classification

With the popularity of Web 2.0, there has been a phenomenal increase in the utility of text classification in applications like document filtering and sentiment categorization. Many of these applications demand that the classification method be efficient and robust, yet produce accurate categorizations by using the terms in the documents only. In this paper, we propose a novel and efficient method using terms-based discriminative information space for robust text classification. Terms in the documents are assigned weights according to the discrimination information they provide for one category over the others. These weights also serve to partition the terms into category sets. A linear opinion pool is adopted for combining the discrimination information provided by each set of terms to yield a feature space (discriminative information space) having dimensions equal to the number of classes. Subsequently, a discriminant function is learned to categorize the documents in the feature space. This classification methodology relies upon corpus information only, and is robust to distribution shifts and noise. We develop theoretical parallels of our methodology with generative, discriminative, and hybrid classifiers. We evaluate our methodology extensively with five different discriminative term weighting schemes on six data sets from different application areas. We give a side-by-side comparison with four well-known text classification techniques. The results show that our methodology consistently outperforms the rest, especially when there is a distribution shift from training to test sets. Moreover, our methodology is simple and effective for different application domains and training set sizes. It is also fast with a small and tunable memory footprint.

[1]  Efstathios Stamatatos,et al.  Syntactic N-grams as machine learning features for natural language processing , 2014, Expert Syst. Appl..

[2]  Dmitriy Fradkin,et al.  Single pass text classification by direct feature weighting , 2011, Knowledge and Information Systems.

[3]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[4]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Hongyun Zhang,et al.  Rough set based hybrid algorithm for text classification , 2009, Expert Syst. Appl..

[7]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[9]  Carlotta Domeniconi,et al.  Building semantic kernels for text classification using wikipedia , 2008, KDD.

[10]  Stefien Bickel,et al.  ECML-PKDD Discovery Challenge 2006 Overview , 2006 .

[11]  Gerald Gartlehner,et al.  [GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes]. , 2013, Zeitschrift fur Evidenz, Fortbildung und Qualitat im Gesundheitswesen.

[12]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[13]  Dino Isa,et al.  Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine , 2008, IEEE Transactions on Knowledge and Data Engineering.

[14]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[15]  Paolo Rosso,et al.  Detection of Opinion Spam with Character n-grams , 2015, CICLing.

[16]  Jun Suzuki,et al.  Semi-Supervised Structured Output Learning Based on a Hybrid Generative and Discriminative Approach , 2007, EMNLP.

[17]  Eneko Agirre,et al.  On Robustness and Domain Adaptation using SVD for Word Sense Disambiguation , 2008, COLING.

[18]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[19]  Jae Yun Lee,et al.  A corpus-based approach to comparative evaluation of statistical term association measures , 2001 .

[20]  Christopher Joseph Pal,et al.  Semi-supervised classification with hybrid generative/discriminative methods , 2007, KDD '07.

[21]  Craig MacDonald,et al.  Using Part-of-Speech N-grams for Sensitive-Text Classification , 2015, ICTIR.

[22]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[23]  Ohad Shamir,et al.  Multiclass-Multilabel Classification with More Classes than Examples , 2010, AISTATS.

[24]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[25]  Jie Liu,et al.  A Generative/Discriminative Hybrid Model: Bayes Perceptron Classifier , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[26]  Christopher Joseph Pal,et al.  Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[27]  Jian-Tao Sun,et al.  Multi-domain active learning for text classification , 2012, KDD.

[28]  Rajat Raina,et al.  Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[29]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[30]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[31]  Moongu Jeon,et al.  CDIM: Document Clustering by Discrimination Information Maximization , 2015, Inf. Sci..

[32]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[33]  Dino Isa,et al.  A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine , 2012, Expert Syst. Appl..

[34]  Wilhelmiina Hämäläinen,et al.  StatApriori: an efficient algorithm for searching statistically significant association rules , 2010, Knowledge and Information Systems.

[35]  Hassan Foroosh,et al.  Exploiting Topical Perceptions over Multi-Lingual Text for Hashtag Suggestion on Twitter , 2013, FLAIRS Conference.

[36]  Asim Karim,et al.  Clustering and Understanding Documents via Discrimination Information Maximization , 2012, PAKDD.

[37]  Galen Andrew,et al.  A Hybrid Markov/Semi-Markov Conditional Random Field for Sequence Segmentation , 2006, EMNLP.

[38]  Jae Yun Lee,et al.  A corpus-based approach to comparative evaluation of statistical term association measures , 2001, J. Assoc. Inf. Sci. Technol..

[39]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[40]  Kagan Tumer,et al.  Classifier ensembles: Select real-world applications , 2008, Inf. Fusion.

[41]  Guillaume Bouchard,et al.  The Tradeoff Between Generative and Discriminative Classifiers , 2004 .

[42]  Michael J Thun,et al.  50-year trends in smoking-related mortality in the United States. , 2013, The New England journal of medicine.

[43]  Iraklis Varlamis,et al.  Semantic smoothing for text clustering , 2013, Knowl. Based Syst..

[44]  Jin Tian,et al.  A Hybrid Generative/Discriminative Bayesian Classifier , 2006, FLAIRS Conference.

[45]  Asim Karim,et al.  PSSF: A Novel Statistical Approach for Personalized Service-side Spam Filtering , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[46]  Xiaoqin Zeng,et al.  A global evaluation criterion for feature selection in text categorization using Kullback-Leibler divergence , 2011, 2011 International Conference of Soft Computing and Pattern Recognition (SoCPaR).

[47]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[48]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[49]  Vincent Lemaire,et al.  Learning with few examples: An empirical study on leading classifiers , 2011, The 2011 International Joint Conference on Neural Networks.

[50]  Tianshun Yao,et al.  An evaluation of statistical spam filtering techniques , 2004, TALIP.

[51]  Wei-Ying Ma,et al.  Improving text classification using local latent semantic indexing , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[52]  C. J. van Rijsbergen,et al.  Learning semantic relatedness from term discrimination information , 2009, Expert Syst. Appl..

[53]  Jinyan Li,et al.  Relative risk and odds ratio: a data mining perspective , 2005, PODS '05.

[54]  William W. Cohen,et al.  Single-pass online learning: performance, voting schemes and online feature selection , 2006, KDD '06.

[55]  Padraig Cunningham,et al.  An Assessment of Case-Based Reasoning for Spam Filtering , 2005, Artificial Intelligence Review.

[56]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[57]  Gurpreet Singh Lehal,et al.  A Survey of Text Mining Techniques and Applications , 2009 .

[58]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[59]  Jinyan Li,et al.  Mining statistically important equivalence classes and delta-discriminative emerging patterns , 2007, KDD '07.

[60]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[61]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[62]  Charles F. Manski,et al.  Estimation of Response Probabilities From Augmented Retrospective Observations , 1985 .

[63]  Asim Karim,et al.  A Robust Discriminative Term Weighting Based Linear Discriminant Method for Text Classification , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[64]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[65]  Karl-Michael Schneider,et al.  A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.

[66]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[67]  George Forman,et al.  Learning from Little: Comparison of Classifiers Given Little Training , 2004, PKDD.

[68]  Christopher Joseph Pal,et al.  Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification , 2006, AAAI.

[69]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[70]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[71]  Anirban Dasgupta,et al.  Feature selection methods for text classification , 2007, KDD '07.