A Machine Learning Approach For Opinion Holder Extraction In Arabic Language

Opinion mining aims at extracting useful subjective information from reliable amounts of text. Opinion mining holder recognition is a task that has not been considered yet in Arabic Language. This task essentially requires deep understanding of clauses structures. Unfortunately, the lack of a robust, publicly available, Arabic parser further complicates the research. This paper presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers. We investigate constructing a comprehensive feature set to compensate the lack of parsing structural outcomes. The proposed feature set is tuned from English previous works coupled with our proposed semantic field and named entities features. Our feature analysis is based on Conditional Random Fields (CRF) and semi-supervised pattern recognition techniques. Different research models are evaluated via cross-validation experiments achieving 54.03 F-measure. We publicly release our own research outcome corpus and lexicon for opinion mining community to encourage further research.

[1]  WiebeJanyce,et al.  Recognizing contextual polarity , 2009 .

[2]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[3]  Claire Cardie,et al.  Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns , 2005, HLT.

[4]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Yorick Wilks,et al.  The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation , 1998, Natural Language Engineering.

[7]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[8]  Dan Jurafsky,et al.  Automatic Extraction of Opinion Propositions and their Holders , 2004 .

[9]  Mohamed G. Elfeky,et al.  Mining Arabic Business Reviews , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[10]  Claire Cardie,et al.  Identifying Expressions of Opinion in Context , 2007, IJCAI.

[11]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Annotation of Modern Standard Arabic Newswire , 2011, Linguistic Annotation Workshop.

[12]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[13]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[14]  Eduard Hovy,et al.  Identifying Opinion Holders for Question Answering in Opinion Texts , 2005 .

[15]  Claire Cardie,et al.  Joint Extraction of Entities and Relations for Opinion Recognition , 2006, EMNLP.

[16]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[17]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[18]  Janyce Wiebe,et al.  Just How Mad Are You? Finding Strong and Weak Opinion Clauses , 2004, AAAI.

[19]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[20]  Soo-Min Kim,et al.  Automatic Detection of Opinion Bearing Words and Sentences , 2005, IJCNLP.

[21]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[22]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[23]  Masaru Kitsuregawa,et al.  Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents , 2007, EMNLP.

[24]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic , 2011, ACL.

[25]  Alaa M. El-Halees,et al.  Arabic Opinion Mining Using Combined Classification Approach , 2011 .

[26]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[27]  Soo-Min Kim,et al.  Identifying and Analyzing Judgment Opinions , 2006, NAACL.

[28]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[29]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[30]  Ellen Riloff,et al.  Exploiting Subjectivity Classification to Improve Information Extraction , 2005, AAAI.

[31]  Sherif M. Abdou,et al.  A Compact Arabic Lexical Semantics Language Resource Based on the Theory of Semantic Fields , 2008, LREC.

[32]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[33]  Ann Banfield,et al.  Unspeakable Sentences : Narration and Representation in the Language of Fiction , 1982 .

[34]  Takashi Inui,et al.  Extracting Semantic Orientations of Phrases from Dictionary , 2007, NAACL.

[35]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[36]  Andrea Esuli,et al.  Determining Term Subjectivity and Term Orientation for Opinion Mining , 2006, EACL.

[37]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[38]  Aidan Finn,et al.  Learning to classify documents according to genre , 2006, J. Assoc. Inf. Sci. Technol..

[39]  Janyce Wiebe,et al.  Identifying Collocations for Recognizing Opinions , 2001 .

[40]  Takashi Inui,et al.  Latent Variable Models for Semantic Orientations of Phrases , 2006, EACL.

[41]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[42]  Maite Taboada,et al.  Methods for Creating Semantic Orientation Dictionaries , 2006, LREC.

[43]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[44]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[45]  Masaru Kitsuregawa,et al.  Automatic Construction of Polarity-Tagged Corpus from HTML Documents , 2006, ACL.

[46]  Marwa Magdy,et al.  Integrated Machine Learning Techniques for Arabic Named Entity Recognition , 2010 .