Dependency Forest for Sentiment Analysis

Dependency Grammars prove to be effective in improving sentiment analysis, because they can directly capture syntactic relations between words. However, most dependency-based systems suffer from a major drawback: they only use 1-best dependency trees for feature extraction, which adversely affects the performance due to parsing errors. Therefore, we propose an approach that applies dependency forest to sentiment analysis. A dependency forest compactly represents multiple dependency trees. We develop new algorithms for extracting features from dependency forest. Experiments show that our forest-based system obtains 5.4 point absolute improvement in accuracy over a bag-of-words system, and 1.3 point improvement over a tree-based system on a widely used sentiment dataset. Our forest-based system also achieves state-of- the-art performance on the sentiment dataset.

[1]  Erhard W. Hinrichs,et al.  41st Annual Meeting of the Association for Computational Linguistics : proceedings of the conference, 7-12 July 2003, Sappro Convention Center, Sapporo, Japan , 2003 .

[2]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[3]  Carolyn Penstein Rosé,et al.  Generalizing Dependency Features for Opinion Mining , 2009, ACL.

[4]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[5]  Xuanjing Huang,et al.  Phrase Dependency Parsing for Opinion Mining , 2009, EMNLP.

[6]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[7]  Yang Liu,et al.  Dependency Forest for Statistical Machine Translation , 2010, COLING.

[8]  Hiroya Takamura,et al.  Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees , 2005, PAKDD.

[9]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[10]  Haitao Mi,et al.  Forest-based Translation Rule Extraction , 2008, EMNLP.

[11]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[12]  Qun Liu,et al.  Dependency Parsing and Projection Based on Word-Pair Classification , 2010, ACL.

[13]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[14]  Michael Collins,et al.  Efficient Third-Order Dependency Parsers , 2010, ACL.

[15]  Mirella Lapata,et al.  Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008) , 2008 .

[16]  Stephanie Seneff,et al.  Review Sentiment Scoring via a Parse-and-Paraphrase Paradigm , 2009, EMNLP.

[17]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[18]  Qun Liu,et al.  Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging , 2008, COLING.

[19]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[20]  Richard Johansson,et al.  Syntactic and Semantic Structure for Opinion Expression Detection , 2010, CoNLL.

[21]  Yang Liu,et al.  Forest-Based Semantic Role Labeling , 2010, AAAI.

[22]  Claire Cardie,et al.  Automatically Generating Annotator Rationales to Improve Sentiment Classification , 2010, ACL.

[23]  Roberto Basili,et al.  Tree Kernels for Semantic Role Labeling , 2008, CL.

[24]  Yifan He,et al.  Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification , 2012, ACL.

[25]  Alessandro Moschitti,et al.  Kernels on Linguistic Structures for Answer Extraction , 2008, ACL.

[26]  Philipp Koehn,et al.  Feature-Rich Statistical Translation of Noun Phrases , 2003, ACL.

[27]  Gary Geunbae Lee,et al.  Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2012, ACL 2012.

[28]  Neil D. Lawrence,et al.  Missing Data in Kernel PCA , 2006, ECML.

[29]  Nitin Madnani,et al.  Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing , 2005 .