Deep sentiments in Roman Urdu text using Recurrent Convolutional Neural Network model

Abstract Although over 64 million people worldwide speak Urdu language and are well aware of its Roman script, limited research and efforts have been made to carry out sentiment analysis and build language resources for the Roman Urdu language. This article proposes a deep learning model to mine the emotions and attitudes of people expressed in Roman Urdu - consisting of 10,021 sentences from 566 online threads belonging to the following genres: Sports; Software; Food & Recipes; Drama; and Politics. The objectives of this research are twofold: (1) to develop a human-annotated benchmark corpus for the under-resourced Roman Urdu language for the sentiment analysis; and (2) to evaluate sentiment analysis techniques using the Rule-based, N-gram, and Recurrent Convolutional Neural Network (RCNN) models. Using Corpus, annotated by three experts to be positive, negative, and neutral with 0.557 Cohen's Kappa score, we run two sets of tests, i.e., binary classification (positive and negative) and tertiary classification (positive, negative and neutral). Finally, the results of the RCNN model are analyzed by comparing it with the outcome of the Rule-based and N-gram models. We show that the RCNN model outperforms baseline models in terms of accuracy of 0.652 for binary classification and 0.572 for tertiary classification.

[1]  Dong-hua Chen,et al.  Iterative estimation of doubly selective channels with ICI suppression for OFDM using KL-BEM , 2010 .

[2]  Rao Muhammad Adeel Nawab,et al.  Multilingual author profiling on Facebook , 2017, Inf. Process. Manag..

[3]  Saeed-Ul Hassan,et al.  Bibliometric-enhanced information retrieval: a novel deep feature engineering approach for algorithm searching from full-text publications , 2019, Scientometrics.

[4]  Jarernsri L. Mitrpanont,et al.  Automatic Discovery of Abusive Thai Language Usages in Social Networks , 2017, ICADL.

[5]  Rabeeh Ayaz Abbasi,et al.  ArWordVec: efficient word embedding models for Arabic tweets , 2020, Soft Comput..

[6]  Sophia Ananiadou,et al.  Identification of research hypotheses and new knowledge from scientific literature , 2018, BMC Medical Informatics and Decision Making.

[7]  Annie Morin,et al.  N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus , 2007, EPIA Workshops.

[8]  Usman Qamar,et al.  HCF-CRS: A Hybrid Content based Fuzzy Conformal Recommender System for providing recommendations with confidence , 2018, PloS one.

[9]  Sophia Ananiadou,et al.  Enriching news events with meta-knowledge information , 2016, Language Resources and Evaluation.

[10]  Samhaa R. El-Beltagy,et al.  NileTMRG at SemEval-2017 Task 4: Arabic Sentiment Analysis , 2017, *SEMEVAL.

[11]  Peter Haddawy,et al.  Identifying Important Citations Using Contextual Information from Full Text , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[12]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[13]  Haris Papageorgiou,et al.  SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[14]  Sophia Ananiadou,et al.  Meta-Knowledge Annotation of Bio-Events , 2010, LREC.

[15]  Bing Liu,et al.  Mining Opinions in Comparative Sentences , 2008, COLING.

[16]  Yunming Ye,et al.  Sentiment analysis through critic learning for optimizing convolutional neural networks with rules , 2019, Neurocomputing.

[17]  Mohib Ullah,et al.  Roman Urdu Opinion Mining System (RUOMiS) , 2015, ArXiv.

[18]  Saeed-Ul Hassan,et al.  Exploiting Social Networks of Twitter in Altmetrics Big Data , 2018 .

[19]  Muhammad Abdul-Mageed,et al.  AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis , 2012, LREC.

[20]  Erik Cambria,et al.  Cognitive-inspired domain adaptation of sentiment lexicons , 2019, Inf. Process. Manag..

[21]  Hammad Afzal,et al.  Creation of Bi-lingual Social Network Dataset Using Classifiers , 2014, MLDM.

[22]  Kalina Bontcheva,et al.  Challenges of Evaluating Sentiment Analysis Tools on Social Media , 2016, LREC.

[23]  Kalina Bontcheva,et al.  The evolution of argumentation mining: From models to social media and emerging tools , 2019, Inf. Process. Manag..

[24]  Saeed-Ul Hassan,et al.  AI Cognition in Searching for Relevant Knowledge from Scholarly Big Data, Using a Multi-layer Perceptron and Recurrent Convolutional Neural Network Model , 2018, WWW.

[25]  Raheel Nawaz,et al.  An Optimal Ride Sharing Recommendation Framework for Carpooling Services , 2018, IEEE Access.

[26]  Sarmad Hussain,et al.  A novel approach for ranking spelling error corrections for Urdu , 2007, Lang. Resour. Evaluation.

[27]  Kenny Q. Zhu,et al.  Knowledge empowered prominent aspect extraction from product reviews , 2019, Inf. Process. Manag..

[28]  Naima Iltaf,et al.  HRS-CE: A hybrid framework to integrate content embeddings in recommender systems for cold start items , 2018, J. Comput. Sci..

[29]  Katarina Boland,et al.  Creating an Annotated Corpus for Sentiment Analysis of German Product Reviews , 2013 .

[30]  Md. Al-Amin,et al.  Sentiment analysis of Bengali comments with Word2Vec and sentiment information of words , 2017, 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE).

[31]  Reda Alhajj,et al.  Emotion and sentiment analysis from Twitter text , 2019, J. Comput. Sci..

[32]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[33]  Hassan Sajjad,et al.  An Unsupervised Method for Discovering Lexical Variations in Roman Urdu Informal Text , 2015, EMNLP.

[34]  Saeed-Ul Hassan,et al.  A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis , 2018, Scientometrics.

[35]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[36]  Mirna Adriani,et al.  Automatically Building a Corpus for Sentiment Analysis on Indonesian Tweets , 2014, PACLIC.

[37]  Nadir Durrani,et al.  Hindi-to-Urdu Machine Translation through Transliteration , 2010, ACL.

[38]  Sophia Ananiadou,et al.  Facilitating the Analysis of Discourse Phenomena in an Interoperable NLP Platform , 2013, CICLing.

[39]  Fereshteh Didegah,et al.  Measuring social media activity of scientific literature: an exhaustive comparison of scopus and novel altmetrics big data , 2017, Scientometrics.

[40]  Jian Zhu,et al.  Sentiment classification using the theory of ANNs , 2010 .

[41]  Tuukka Ruotsalo,et al.  Understanding user behavior in naturalistic information search tasks , 2019, J. Assoc. Inf. Sci. Technol..

[42]  Saeed-Ul Hassan,et al.  DS4A: Deep Search System for Algorithms from Full-Text Scholarly Big Data , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[43]  Saeed-Ul Hassan,et al.  Deep Stylometry and Lexical & Syntactic Features Based Author Attribution on PLoS Digital Repository , 2017, ICADL.

[44]  Long-Sheng Chen,et al.  Journal of Informetrics , 2022 .

[45]  Giuseppe Attardi,et al.  UniPI at SemEval-2016 Task 4: Convolutional Neural Networks for Sentiment Classification , 2016, *SEMEVAL.

[46]  Mahmoud Al-Ayyoub,et al.  Enhancing Aspect-Based Sentiment Analysis of Arabic Hotels' reviews using morphological, syntactic and semantic features , 2019, Inf. Process. Manag..

[47]  Murat Saraclar,et al.  BUSEM at SemEval-2017 Task 4A Sentiment Analysis with Word Embedding and Long Short Term Memory RNN Approaches , 2017, SemEval@ACL.

[48]  Sophia Ananiadou,et al.  Identification of Manner in Bio-Events , 2012, LREC.

[49]  Hammad Afzal,et al.  Opinion analysis of Bi-lingual Event Data from Social Networks , 2013, ESSEM@AI*IA.

[50]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[51]  Hyopil Shin,et al.  KOSAC: A Full-Fledged Korean Sentiment Analysis Corpus , 2013, PACLIC.

[52]  Abed Allah Khamaiseh,et al.  A comprehensive survey of arabic sentiment analysis , 2019, Inf. Process. Manag..

[53]  Khairullah Khan,et al.  Supervised Urdu Word Segmentation Model Based on POS Information , 2018, EAI Endorsed Trans. Scalable Inf. Syst..

[54]  Namita Mittal,et al.  Sentiment Analysis of Hindi Reviews based on Negation and Discourse Relation , 2013 .

[55]  Yezheng Liu,et al.  Identifying social roles using heterogeneous features in online social networks , 2019, J. Assoc. Inf. Sci. Technol..

[56]  Khawar Khurshid,et al.  An expert system for diabetes prediction using auto tuned multi-layer perceptron , 2017, 2017 Intelligent Systems Conference (IntelliSys).

[57]  Kamran Ahsan,et al.  Transtech: development of a novel translator for Roman Urdu to English , 2019, Heliyon.

[58]  Hassan Sajjad,et al.  Tagging Urdu Text with Parts of Speech: A Tagger Comparison , 2009, EACL.

[59]  Sophia Ananiadou,et al.  Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature , 2011, BMC Bioinformatics.

[60]  Tomek Strzalkowski,et al.  Classification of Dialogue Acts in Urdu Multi-party Discourse , 2011, KDIR.

[61]  Sophia Ananiadou,et al.  News search using discourse analytics , 2013, 2013 Digital Heritage International Congress (DigitalHeritage).

[62]  Sophia Ananiadou,et al.  Enhancing Search: Events and Their Discourse Context , 2013, CICLing.

[63]  Sophia Ananiadou,et al.  Negated bio-events: analysis and identification , 2013, BMC Bioinformatics.

[64]  Carlos Angel Iglesias,et al.  A semantic similarity-based perspective of affect lexicons for sentiment analysis , 2019, Knowl. Based Syst..