Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment analysis datasets, sometimes providing a new state-of-the-art performance level.

[1]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.

[2]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[3]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[4]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[5]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[6]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[7]  Donna K. Harman,et al.  Overview of the Eighth Text REtrieval Conference (TREC-8) , 1999, TREC.

[8]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[9]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[10]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[11]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[12]  Charles L. A. Clarke,et al.  A multi-system analysis of document and term selection for blind feedback , 2004, CIKM '04.

[13]  Djoerd Hiemstra,et al.  Parsimonious language models for information retrieval , 2004, SIGIR '04.

[14]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[15]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[16]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[17]  Casey Whitelaw Using Appraisal Taxonomies for Sentiment Analysis , 2005 .

[18]  Clement T. Yu,et al.  Word sense disambiguation in queries , 2005, CIKM '05.

[19]  Arjen P. de Vries,et al.  Relevance information: a loss of entropy but a gain for IDF? , 2005, SIGIR '05.

[20]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[21]  Mark Sanderson,et al.  Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.

[22]  Charles L. A. Clarke,et al.  The TREC 2005 Terabyte Track , 2005, TREC.

[23]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[24]  W. Bruce Croft,et al.  Indri at TREC 2005: Terabyte Track , 2005, TREC.

[25]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[26]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[27]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[28]  ChengXiang Zhai,et al.  Semantic term matching in axiomatic approaches to information retrieval , 2006, SIGIR.

[29]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[30]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[31]  Wei Zhang,et al.  Recognition and classification of noun phrases in queries for effective retrieval , 2007, CIKM '07.

[32]  Karo Moilanen,et al.  Sentiment Composition , 2007 .

[33]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[34]  Alistair Moffat,et al.  Score standardization for inter-collection comparison of retrieval systems , 2008, SIGIR '08.

[35]  Michael I. Jordan,et al.  DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[36]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[37]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[38]  Alistair Moffat,et al.  Has adhoc retrieval improved since 1994? , 2009, SIGIR.

[39]  Timothy W. Finin,et al.  Delta TFIDF: An Improved Feature Space for Sentiment Analysis , 2009, ICWSM.

[40]  Kentaro Inui,et al.  Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables , 2010, NAACL.

[41]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[42]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[43]  Ryan P. Adams,et al.  Training Restricted Boltzmann Machines on Word Observations , 2012, ICML.