Effective lexicon-based approach for Urdu sentiment analysis

The lexicon-based approach is used for sentiment analysis of Urdu. In the lexicon, apart from the traditional approach of having adjectives, nouns and negations we have also included verbs, intensifiers and context-dependent words. An effective Urdu sentiment analyzer is developed that applies rules and make use of this new lexicon and perform Urdu sentiment analysis by classifying sentences as positive, negative or neutral. Evaluating this Urdu sentiment analyzer, by using sentences from Urdu blogs, yields the most promising results so far in Urdu language with 89.03% accuracy with 0.86 precision, 0.90 recall and 0.88 F-measure. Results are evaluated using kappa statistics as well. The comparison with the previous work in Urdu shows that the combination of this Urdu sentiment lexicon and Urdu sentiment analyzer is much more effective than the previous such combinations. The main reason for increased efficiency is the development of wide coverage lexicon and effective handling of negations, intensifiers and context-dependent words by the Urdu sentiment analyzer.

[1]  Clement T. Yu,et al.  Construction of a sentimental word dictionary , 2010, CIKM '10.

[2]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[3]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[4]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[5]  Timothy Baldwin,et al.  Encoding Sentiment Information into Word Vectors for Sentiment Analysis , 2018, COLING.

[6]  Hammad Afzal,et al.  Opinion analysis of Bi-lingual Event Data from Social Networks , 2013, ESSEM@AI*IA.

[7]  Dragomir R. Radev,et al.  Identifying Text Polarity Using Random Walks , 2010, ACL.

[8]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[9]  Shah Nazir,et al.  Identification and handling of intensifiers for enhancing accuracy of Urdu sentiment analysis , 2018, Expert Syst. J. Knowl. Eng..

[10]  Sarmad Hussain,et al.  Analysis and Development of Urdu POS Tagged Corpus , 2009, ALR7@IJCNLP.

[11]  Tejashri Inadarchand Jain,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2010 .

[12]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[13]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[14]  Imran Sarwar Bajwa,et al.  Lexicon-based sentiment analysis for Urdu language , 2016, 2016 Sixth International Conference on Innovative Computing Technology (INTECH).

[15]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[16]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[17]  Muhammad Aslam,et al.  Associating targets with SentiUnits: a step forward in sentiment analysis of Urdu text , 2012, Artificial Intelligence Review.

[18]  Yue Lu,et al.  Automatic construction of a context-aware sentiment lexicon: an optimization approach , 2011, WWW.

[19]  Nigel Collier,et al.  Sentiment Analysis using Support Vector Machines with Diverse Information Sources , 2004, EMNLP.

[20]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[21]  Mohammad Abid Khan,et al.  Lexicon-based approach outperforms Supervised Machine Learning approach for Urdu Sentiment Analysis in multiple domains , 2018, Telematics Informatics.

[22]  Steven Skiena,et al.  Building Sentiment Lexicons for All Major Languages , 2014, ACL.

[23]  Desheng Dash Wu,et al.  Using text mining and sentiment analysis for online forums hotspot detection and forecast , 2010, Decis. Support Syst..

[24]  Sarmad Hussain,et al.  Corpus Based Urdu Lexicon Development , 2007 .

[25]  Clement T. Yu,et al.  The effect of negation on sentiment analysis and retrieval effectiveness , 2009, CIKM.

[26]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[27]  Mohib Ullah,et al.  Roman Urdu Opinion Mining System (RUOMiS) , 2015, ArXiv.

[28]  Hiroshi Kanayama,et al.  Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis , 2006, EMNLP.

[29]  Spencer P. Greenhalgh,et al.  Potential Applications of Sentiment Analysis in Educational Research and Practice – Is SITE the Friendliest Conference? , 2015 .

[30]  Miaomiao Wen,et al.  Disambiguating Dynamic Sentiment Ambiguous Adjectives , 2010, COLING.

[31]  Saif Mohammad,et al.  Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus , 2009, EMNLP.

[32]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[33]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[34]  Masaru Kitsuregawa,et al.  Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents , 2007, EMNLP.

[35]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[36]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[37]  Janyce Wiebe,et al.  +/-EffectWordNet: Sense-level Lexicon Acquisition for Opinion Inference , 2014, EMNLP.

[38]  Frank Keller,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL , 2014, EMNLP.

[39]  Harith Alani,et al.  Contextual semantics for sentiment analysis of Twitter , 2016, Inf. Process. Manag..

[40]  Mohammad Abid Khan,et al.  Effective Use of Evaluation Measures for the Validation of Best Classifier in Urdu Sentiment Analysis , 2017, Cognitive Computation.

[41]  Hong Wang,et al.  Polarity Consistency Checking for Sentiment Dictionaries , 2012, ACL.

[42]  王挺,et al.  Construction of unsupervised sentiment classifier on idioms resources , 2014 .

[43]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[44]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[45]  Mervat Gheith,et al.  Automatic Expandable Large-Scale Sentiment Lexicon of Modern Standard Arabic and Colloquial , 2015, 2015 First International Conference on Arabic Computational Linguistics (ACLing).

[46]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[47]  Beata Beigman Klebanov,et al.  Using Pivot-Based Paraphrasing and Sentiment Profiles to Improve a Subjectivity Lexicon for Essay Data , 2013, Transactions of the Association for Computational Linguistics.

[48]  Erik Cambria,et al.  Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM , 2018, AAAI.

[49]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[50]  M. de Rijke,et al.  UvA-DARE ( Digital Academic Repository ) Using WordNet to measure semantic orientations of adjectives , 2004 .

[51]  Song Wei,et al.  A novel feature-based method for sentiment analysis of Chinese product reviews , 2014, China Communications.

[52]  Andrew Hardie,et al.  Developing a tagset for automated part-of-speech tagging in Urdu. , 2003 .

[53]  Takashi Inui,et al.  Extracting Semantic Orientations of Words using Spin Model , 2005, ACL.

[54]  Vineet Yadav,et al.  Serendio: Simple and Practical lexicon based approach to Sentiment Analysis , 2013, *SEMEVAL.

[55]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[56]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[57]  Arno Scharl,et al.  Extracting and Grounding Contextualized Sentiment Lexicons , 2013, IEEE Intelligent Systems.

[58]  Ana María Martínez Enríquez,et al.  Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits , 2010, MICAI.

[59]  Philip J. Stone,et al.  A computer approach to content analysis: studies using the General Inquirer system , 1963, AFIPS Spring Joint Computing Conference.

[60]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[61]  Yejin Choi,et al.  Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning , 2013, ACL.

[62]  Harald Hammarström,et al.  Urdu Morphology, Orthography and Lexicon Extraction , 2007 .

[63]  Atul Mishra,et al.  A Scalable, Lexicon Based Technique for Sentiment Analysis , 2014, FOCS 2014.

[64]  Chung-Chian Hsu,et al.  Mining Synonymous Transliterations from the World Wide Web , 2010, TALIP.

[65]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[66]  Rohini K. Srihari,et al.  An Information-Extraction System for Urdu---A Resource-Poor Language , 2010, TALIP.

[67]  Hongliang Yu,et al.  Identifying Sentiment Words Using an Optimization-based Model without Seed Words , 2013, ACL.

[68]  Quan Pan,et al.  Learning Word Representations for Sentiment Analysis , 2017, Cognitive Computation.

[69]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[70]  Erik Cambria,et al.  Aspect extraction for opinion mining with a deep convolutional neural network , 2016, Knowl. Based Syst..

[71]  Raymond Chiong,et al.  Multilingual sentiment analysis: from formal to informal and scarce resource languages , 2016, Artificial Intelligence Review.

[72]  Soo-Min Kim,et al.  Automatic Identification of Pro and Con Reasons in Online Reviews , 2006, ACL.

[73]  Shrikanth S. Narayanan,et al.  Tweester at SemEval-2016 Task 4: Sentiment Analysis in Twitter Using Semantic-Affective Model Adaptation , 2016, *SEMEVAL.