论文信息 - Instance Selection Improves Cross-Lingual Model Training for Fine-Grained Sentiment Analysis

Instance Selection Improves Cross-Lingual Model Training for Fine-Grained Sentiment Analysis

Scarcity of annotated corpora for many languages is a bottleneck for training finegrained sentiment analysis models that can tag aspects and subjective phrases. We propose to exploit statistical machine translation to alleviate the need for training data by projecting annotated data in a source language to a target language such that a supervised fine-grained sentiment analysis system can be trained. To avoid a negative influence of poor-quality translations, we propose a filtering approach based on machine translation quality estimation measures to select only high-quality sentence pairs for projection. We evaluate on the language pair German/English on a corpus of product reviews annotated for both languages and compare to in-target-language training. Projection without any filtering leads to 23 % F1 in the task of detecting aspect phrases, compared to 41 % F1 for in-target-language training. Our approach obtains up to 47 % F1. Further, we show that the detection of subjective phrases is competitive to in-target-language training without filtering.

Philipp Cimiano | Roman Klinger

[1] Iryna Gurevych,et al. Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields , 2010, EMNLP.

[2] Yulia Tsvetkov,et al. Metaphor Detection with Cross-Lingual Model Transfer , 2014, ACL.

[3] B. Alexandra,et al. Rethinking Sentiment Analysis in the News: from Theory to Practice and back , 2009 .

[4] Lucia Specia,et al. QuEst - A translation quality estimation framework , 2013, ACL.

[5] Eleftherios Avramidis,et al. Evaluate with Confidence Estimation: Machine ranking of translation outputs using grammatical features , 2011, WMT@EMNLP.

[6] Hatem Ghorbel,et al. Experiments in Cross-Lingual Sentiment Analysis in Discussion Forums , 2012, SocInfo.

[7] Philipp Cimiano,et al. The USAGE review corpus for fine grained multi lingual opinion analysis , 2014, LREC.

[8] Janyce Wiebe,et al. Annotating Attributions and Private States , 2005, FCA@ACL.

[9] Ting Liu,et al. Creating a Fine-Grained Corpus for Chinese Sentiment Analysis , 2015, IEEE Intelligent Systems.

[10] Kerstin Denecke,et al. Using SentiWordNet for multilingual sentiment analysis , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[11] Kam-Fai Wong,et al. Cross lingual opinion holder extraction based on multi-kernel SVMs and transfer learning , 2013, World Wide Web.

[12] Andrew McCallum,et al. SampleRank: Training Factor Graphs with Atomic Gradients , 2011, ICML.

[13] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[14] Xuanjing Huang,et al. Opinion Mining with Sentiment Graph , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[15] Maite Taboada,et al. Cross-Linguistic Sentiment Analysis: From English to Spanish , 2009, RANLP.

[16] Jörg Tiedemann,et al. Cross-lingual Dependency Parsing of Related Languages with Rich Morphosyntactic Tagsets , 2014, EMNLP 2014.

[17] Imed Zitouni,et al. Multilingual Natural Language Processing Applications: From Theory to Practice , 2012 .

[18] Xiaojun Wan,et al. CLOpinionMiner: Opinion Target Extraction in a Cross-Language Scenario , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19] Philipp Cimiano,et al. Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model , 2013, ACL.

[20] Iryna Gurevych,et al. Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews , 2010, ACL.

[21] Bing Liu,et al. Mining and summarizing customer reviews , 2004, KDD.

[22] Oren Etzioni,et al. Extracting Product Features and Opinions from Reviews , 2005, HLT.

[23] Ivan Titov,et al. Cross-lingual Model Transfer Using Feature Representation Projection , 2014, ACL.

[24] Lucia Specia,et al. An Investigation on the Effectiveness of Features for Translation Quality Estimation , 2013, MTSUMMIT.

[25] David Yarowsky,et al. Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[26] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.