Sentiment classification with interpolated information diffusion kernels

Information diffusion kernels - similarity metrics in non-Euclidean information spaces - have been found to produce state of the art results for document classification. In this paper, we present a novel approach to global sentiment classification using these kernels. We carry out a large array of experiments addressing the well-known movie review data set of Pang and Lee, a de facto benchmark, comparing information diffusion kernels with a standard RBF kernel machine. Our results show that interpolation of unigram and bigram information is beneficiary for sentiment classification.

[1]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[2]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[5]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[6]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[7]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[8]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[9]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..

[10]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[11]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[12]  Siddharth Patwardhan,et al.  Feature Subsumption for Opinion Analysis , 2006, EMNLP.

[13]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[14]  Guy Lebanon Sequential Document Representations and Simplicial Curves , 2006, UAI.

[15]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[16]  Sabine Bergler,et al.  All Blogs Are Not Made Equal: Exploring Genre Differences in Sentiment Tagging of Blogs , 2007, ICWSM.