Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales

We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must determine an author's evaluation with respect to a multi-point scale (e.g., one to five "stars"). This task represents an interesting twist on standard multi-class text categorization because there are several different degrees of similarity between class labels; for example, "three stars" is intuitively closer to "four stars" than to "one star".We first evaluate human performance at the task. Then, we apply a meta-algorithm, based on a metric labeling formulation of the problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem.

[1]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[2]  Davi Geiger,et al.  Occlusions, Discontinuities, and Epipolar Lines in Stereo , 1998, ECCV.

[3]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[4]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[5]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Eric Horvitz,et al.  Attention-Sensitive Alerting , 1999, UAI.

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[9]  Pero Subasic,et al.  Affect analysis of text using fuzzy semantic typing , 2001, IEEE Trans. Fuzzy Syst..

[10]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[11]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[12]  Éva Tardos,et al.  Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields , 2002, JACM.

[13]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[14]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[15]  Hiroshi Ishikawa,et al.  Exact Optimization for Markov Random Fields with Convex Priors , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[17]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[18]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[19]  Yuji Matsumoto,et al.  Modeling Category Structures with a Kernel Function , 2004, CoNLL.

[20]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[21]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[22]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[23]  Janyce Wiebe,et al.  Just How Mad Are You? Finding Strong and Weak Opinion Clauses , 2004, AAAI.

[24]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[25]  R. E. Kent Robert E , 2005 .

[26]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[27]  Moshe Koppel,et al.  THE IMPORTANCE OF NEUTRAL EXAMPLES FOR LEARNING SENTIMENT , 2006, Comput. Intell..