Using Sequence Kernels to identify Opinion Entities in Urdu

Automatic extraction of opinion holders and targets (together referred to as opinion entities) is an important subtask of sentiment analysis. In this work, we attempt to accurately extract opinion entities from Urdu newswire. Due to the lack of resources required for training role labelers and dependency parsers (as in English) for Urdu, a more robust approach based on (i) generating candidate word sequences corresponding to opinion entities, and (ii) subsequently disambiguating these sequences as opinion holders or targets is presented. Detecting the boundaries of such candidate sequences in Urdu is very different than in English since in Urdu, grammatical categories such as tense, gender and case are captured in word inflections. In this work, we exploit the morphological inflections associated with nouns and verbs to correctly identify sequence boundaries. Different levels of information that capture context are encoded to train standard linear and sequence kernels. To this end the best performance obtained for opinion entity detection for Urdu sentiment analysis is 58.06% F-Score using sequence kernels and 61.55% F-Score using a combination of sequence and linear kernels.

[1]  Cheng Niu,et al.  InfoXtract: A Customizable Intermediate Level Information Extraction Engine , 2003, Natural Language Engineering.

[2]  Dietrich Klakow,et al.  Convolution Kernels for Opinion Holder Extraction , 2010, NAACL.

[3]  Mengqiu Wang,et al.  A Re-examination of Dependency Path Kernels for Relation Extraction , 2008, IJCNLP.

[4]  Roberto Basili,et al.  Tree Kernels for Semantic Role Labeling , 2008, CL.

[5]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[6]  Rohini K. Srihari,et al.  A Vector Space Model for Subjectivity Classification in Urdu aided by Co-Training , 2010, COLING.

[7]  Claire Cardie,et al.  Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns , 2005, HLT.

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[10]  R O H I N,et al.  InfoXtract : A customizable intermediate level information extraction engine , 2022 .

[11]  Sung-Hyon Myaeng,et al.  Extracting Topic-related Opinions and their Targets in NTCIR-7 , 2008, NTCIR.

[12]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[13]  SrihariRohini,et al.  An Information-Extraction System for Urdu---A Resource-Poor Language , 2010 .

[14]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[15]  Scott Grimm,et al.  Subject-Marking in Hindi/Urdu: A Study in Case and Agency , 2006 .

[16]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Rohini K. Srihari,et al.  An Information-Extraction System for Urdu---A Resource-Poor Language , 2010, TALIP.

[19]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[20]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[21]  Shlomo Argamon,et al.  Appraisal Extraction for News Opinion Analysis at NTCIR-6 , 2007, NTCIR.

[22]  Claire Cardie,et al.  Annotating Topics of Opinions , 2008, LREC.

[23]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics) , 2006 .

[24]  Dan Jurafsky,et al.  Automatic Extraction of Opinion Propositions and their Holders , 2004 .

[25]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[26]  Eduard Hovy,et al.  Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text , 2006 .

[27]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.