Sentiment Classification Using Supervised Sub-Spacing

An important application domain for Machine learning is sentiment classification. Here, the traditional approach is to represent documents using a Bag-Of-Words (BOW) model, where individual terms are used as features. However, the BOW model is unable to sufficiently model the variation inherent in natural language text. Term-relatedness metrics are commonly used to overcome this limitation by capturing latent semantic concepts or topics in documents. However, representations produced using standard term relatedness approaches do not take into account class membership of documents. In this work, we present a novel approach called Supervised Sub-Spacing (S3) for introducing supervision to term-relatedness extraction. S3 works by creating a separate sub-space for each class within which term relations are extracted such that documents belonging to the same class are made more similar to one another. Recent approaches in sentiment classification have proposed combining machine learning with background knowledge from sentiment lexicons for improved performance. Thus, we present a simple, yet effective approach for augmenting S3 with background knowledge from SentiWordNet. Evaluation shows S3 to significantly out perform the state-of-the-art SVM classifier. Results also show that using background knowledge from SentiWordNet significantly improves the performance of S3.

[1]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[2]  Sutanu Chakraborti,et al.  Acquiring Word Similarities with Higher Order Association Mining , 2007, ICCBR.

[3]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[4]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[5]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[6]  Charu C. Aggarwal,et al.  Mining Text Data , 2012 .

[7]  Wei-Ying Ma,et al.  Supervised latent semantic indexing for document categorization , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[8]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[9]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Mark Levene,et al.  Combining lexicon and learning based approaches for concept-level sentiment analysis , 2012, WISDOM '12.

[12]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[13]  George Tsatsaronis,et al.  A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness , 2009, EACL.

[14]  Pavel Brazdil,et al.  Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks , 2006, IFIP AI.

[15]  Hsinchun Chen,et al.  A Lexicon-Enhanced Method for Sentiment Classification: An Experiment on Online Product Reviews , 2010, IEEE Intelligent Systems.

[16]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[17]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[18]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .