If We Want Your Opinion

Sentiment has traditionally been considered a "deep" attribute of writing, often requiring the interpretation of figurative language to uncover the writer's intention. The natural language processing community has become increasingly interested in detecting, through automatic means, the expression of opinions and measuring the intensity of emotions held by the writer. Despite the depth and abstraction often associated with expressions of sentiment, we apply strictly lexical analysis to the opinions expressed about books and find that machine learning techniques are capable of resolving even fine-grained distinctions between opinions. Using an averaged perceptron classifier trained using a word subsequence kernel, we achieve an accuracy of 89% when distinguishing between 1- and 5-star reviews. Further, this same model yields significant separation when scoring intermediate reviews - making distinctions even human annotators find difficult. We detail the collection of data for supervised training and present the results of our sentiment classifier along with some discussion about why we believe this approach to be effective.

[1]  Maria da Graça Campos Pimentel,et al.  MOJOHON: a channel-driven communication architecture for applications deployed on the internet , 2007, SAC '07.

[2]  Sujata Banerjee,et al.  Estimating network proximity and latency , 2006, CCRV.

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[5]  Cees T. A. M. de Laat,et al.  Using RDF to describe networks , 2006, Future Gener. Comput. Syst..

[6]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[7]  Katia Obraczka,et al.  Network latency metrics for server proximity , 2000, Globecom '00 - IEEE. Global Telecommunications Conference. Conference Record (Cat. No.00CH37137).

[8]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[9]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[10]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[11]  Juho Rousu,et al.  Efficient Computation of Gapped Substring Kernels on Large Alphabets , 2005, J. Mach. Learn. Res..

[12]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[13]  Xiaojin Zhu,et al.  Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization , 2006 .

[14]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[15]  Ernst W. Biersack Where is multicast today? , 2005, CCRV.

[16]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[17]  Mehryar Mohri,et al.  Rational Kernels: Theory and Algorithms , 2004, J. Mach. Learn. Res..