Cross lingual opinion holder extraction based on multi-kernel SVMs and transfer learning

Fine grained opinion analysis has much higher demand for annotated corpus which makes high quality analysis difficult when there are insufficient resources. In this paper we explore the use of cross lingual resources for opinion mining for resource poor languages. This paper presents a novel approach for cross lingual opinion holder extraction through leveraging finely annotated opinion corpus selectively from a source language as the supplementary training samples for the target language. Firstly, the opinion corpus in the source language with fine grained annotations are translated and projected to the target language to generate the training samples. Then, a classifier based on multi-kernel Support Vector Machines (SVMs) is developed to identify opinion holders in the target language, which uses a tree kernel based on syntactic features and a polynomial kernel based on semantic features, respectively. The two kernels are further improved by incorporating a pivot function based on word pair similarity. To reduce the noise of low quality translated samples, a Transfer learning algorithm is applied to select high quality translated samples iteratively for training the multi-kernel classifiers on the target language. Evaluations on transferring MPQA, an English opinion corpus (as the source language), to Chinese opinion analysis (as the target language) show that the opinion holder extraction performance on NTCIR-7 MOAT dataset is improved, which is higher than the Conditional Random Fields (CRFs) based approach and most reported systems in NTCIR-7 MOAT evaluation.

[1]  Kam-Fai Wong,et al.  Coarse-Fine Opinion Mining - WIA in NTCIR-7 MOAT Task , 2008, NTCIR.

[2]  Houfeng Wang,et al.  Detecting Opinionated Sentences by Extracting Context Information , 2008, NTCIR.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[5]  Claire Cardie,et al.  Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora , 2011, ACL.

[6]  Hsin-Hsi Chen,et al.  Overview of Multilingual Opinion Analysis Task at NTCIR-7 , 2008, NTCIR.

[7]  Xiaolong Wang,et al.  Cross Lingual Opinion Analysis via Transfer Learning , 2010, Aust. J. Intell. Inf. Process. Syst..

[8]  Claire Cardie,et al.  Hierarchical Sequential Learning for Extracting Opinions and Their Attributes , 2010, ACL.

[9]  Dietrich Klakow,et al.  Convolution Kernels for Opinion Holder Extraction , 2010, NAACL.

[10]  Lynda L. McGhie,et al.  World Wide Web , 2011, Encyclopedia of Information Assurance.

[11]  Jingbo Zhu,et al.  NEUOM: Identifying Opinionated Sentences in Chinese and English Text , 2008, NTCIR.

[12]  Alessandro Moschitti,et al.  Tree Kernel Engineering for Proposition Re-ranking , 2006 .

[13]  Claire Cardie,et al.  Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns , 2005, HLT.

[14]  Gary Geunbae Lee,et al.  A Cross-lingual Annotation Projection Approach for Relation Detection , 2010, COLING.

[15]  Eduard Hovy,et al.  Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text , 2006 .

[16]  Christopher Joseph Pal,et al.  Cross Lingual Adaptation: An Experiment on Sentiment Classifications , 2010, ACL.

[17]  Richard Johansson,et al.  Syntactic and Semantic Structure for Opinion Expression Detection , 2010, CoNLL.

[18]  Heiner Stuckenschmidt,et al.  Fine-Grained Sentiment Analysis with Structural Features , 2011, IJCNLP.

[19]  Alessandro Moschitti,et al.  A Study on Convolution Kernels for Shallow Statistic Parsing , 2004, ACL.

[20]  Claire Cardie,et al.  Joint Extraction of Entities and Relations for Opinion Recognition , 2006, EMNLP.