Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification

Convolution kernels support the modeling of complex syntactic information in machine-learning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus.

[1]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[2]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[3]  Alan F. Smeaton,et al.  Exploring the use of paragraph-level annotations for sentiment analysis of financial blogs , 2009 .

[4]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[5]  Richard Johansson,et al.  Syntactic and Semantic Structure for Opinion Expression Detection , 2010, CoNLL.

[6]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[7]  Alessandro Moschitti,et al.  Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction , 2009, EMNLP.

[8]  Stephanie Seneff,et al.  Review Sentiment Scoring via a Parse-and-Paraphrase Paradigm , 2009, EMNLP.

[9]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[10]  Carolyn Penstein Rosé,et al.  Generalizing Dependency Features for Opinion Mining , 2009, ACL.

[11]  Roberto Basili,et al.  Tree Kernels for Semantic Role Labeling , 2008, CL.

[12]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[13]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[14]  Jian Su,et al.  A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features , 2006, ACL.

[15]  Dietrich Klakow,et al.  Convolution Kernels for Opinion Holder Extraction , 2010, NAACL.

[16]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[17]  Hiroya Takamura,et al.  Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees , 2005, PAKDD.

[18]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[19]  Alessandro Moschitti,et al.  Kernels on Linguistic Structures for Answer Extraction , 2008, ACL.

[20]  K. Glasgow,et al.  Los Angeles, California , 2003 .

[21]  Tejashri Inadarchand Jain,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2010 .

[22]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[23]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[24]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[25]  Moshe Koppel,et al.  Using Neutral Examples for Learning Polarity , 2005, IJCAI.