Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits

Like other languages, Urdu websites are becoming more popular, because the people prefer to share opinions and express sentiments in their own language. Sentiment analyzers developed for other well-studied languages, like English, are not workable for Urdu, due to their scriptic, morphological, and grammatical differences. As a result, this language should be studied as an independent problem domain. Our approach towards sentiment analysis is based on the identification and extraction of SentiUnits from the given text, using shallow parsing. SentiUnits are the expressions, which contain the sentiment information in a sentence. We use sentiment-annotated lexicon based approach. Unluckily, for Urdu language no such lexicon exists. So, a major part of this research consists in developing such a lexicon. Hence, this paper is presented as a base line for this colossal and complex task. Our goal is to highlight the linguistic (grammar and morphology) as well as technical aspects of this multidimensional research problem. The performance of the system is evaluated on multiple texts and the achieved results are quite satisfactory.

[1]  Claire Cardie,et al.  The Power of Negative Thinking: Exploiting Label Disagreement in the Min-cut Classification Framework , 2008, COLING.

[2]  Steven Skiena,et al.  International Sentiment Analysis for News and Blogs , 2021, ICWSM.

[3]  Shlomo Argamon,et al.  Unsupervised Extraction of Appraisal Expressions , 2010, Canadian Conference on AI.

[4]  Yishay Mansour,et al.  Multiple Source Adaptation and the Rényi Divergence , 2009, UAI.

[5]  Grzegorz Kondrak,et al.  A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs , 2008, Canadian Conference on AI.

[6]  Kashif Riaz,et al.  Challenges in Urdu stemming: a progress report , 2007 .

[7]  Sarmad Hussain,et al.  Corpus Based Urdu Lexicon Development , 2007 .

[8]  Hongbo Xu,et al.  Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis , 2009, ECIR.

[9]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[10]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[11]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[12]  Nadir Durrani,et al.  Urdu Word Segmentation , 2010, NAACL.

[13]  Sarmad Hussain,et al.  Assas-band, an Affix-Exception-List Based Urdu Stemmer , 2009, ALR7@IJCNLP.

[14]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[15]  Shlomo Argamon,et al.  Automated learning of appraisal extraction patterns , 2010 .

[16]  Casey Whitelaw Using Appraisal Taxonomies for Sentiment Analysis , 2005 .

[17]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[18]  Sabine Bergler,et al.  Semantic Tag Extraction from WordNet Glosses , 2006, LREC.

[19]  Andrew P. Sage,et al.  Uncertainty in Artificial Intelligence , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[20]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[21]  Syin Chan,et al.  Effectiveness of Simple Linguistic Processing in Automatic Sentiment Classification of Product Reviews , 2004 .

[22]  Stefanie Wulff,et al.  Corpus-linguistic applications : current studies, new directions , 2010 .