Learning Subjective Language : Feature Engineered vs . Deep Models

Treatment of subjective language is a vital component of a sentiment analysis system. However, detection of subjectivity (i.e., subjective vs. objective content) has attracted far less attention than sentiment recognition (i.s., positive vs. negative language). Particularly, online social context and the structural attributes of communication therein promise to help improve learning of subjective language. In this work, we describe successful models exploiting a rich and comprehensive feature set based on the structural and social context of the Twitter domain. In light of the recent successes of deep learning models, we also effectively experiment with deep gated recurrent neural networks (GRU) on the task. Our models exploiting structure and social context with an SVM achieve > 12% accuracy higher than a competitive baseline on a blind test set. Our GRU model yields even better performance, reaching 77.19 (i.e., ∼ 14.50% higher than the baseline on the same test set, p < 0.001).

[1]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  William W. Cohen Learning Trees and Rules with Set-Valued Features , 1996, AAAI/IAAI, Vol. 1.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[6]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[7]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[8]  문정진 § 19 , 2000 .

[9]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[10]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[11]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[12]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[13]  T. Joachims Support Vector Machines , 2002 .

[14]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[15]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[16]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[17]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[18]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[19]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[20]  Xiaoyan Zhu,et al.  Movie review mining and summarization , 2006, CIKM '06.

[21]  Wei-Hao Lin,et al.  Are These Documents Written from Different Perspectives? A Test of Different Perspectives Based on Statistical Distribution Divergence , 2006, ACL.

[22]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[23]  Janyce Wiebe,et al.  RECOGNIZING STRONG AND WEAK OPINION CLAUSES , 2006, Comput. Intell..

[24]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[25]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[26]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[27]  Susan C. Herring,et al.  A Faceted Classification Scheme for Computer-Mediated Discourse , 2007 .

[28]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[29]  Claire Cardie,et al.  Identifying Expressions of Opinion in Context , 2007, IJCAI.

[30]  Michael Beißwenger,et al.  Introduction: Data and Methods in Computer-Mediated Discourse Analysis , 2008 .

[31]  Muhammad Abdul-Mageed,et al.  ARABIC AND ENGLISH NEWS COVERAGE ON ALJAZEERA.NET , 2008 .

[32]  Manfred Klenner,et al.  PolArt: A Robust Tool for Sentiment Analysis , 2009, NODALIDA.

[33]  Nicolas Nicolov,et al.  Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations , 2009, ICWSM.

[34]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[35]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[36]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[37]  Dietrich Klakow,et al.  A survey on the role of negation in sentiment analysis , 2010, NeSp-NLP@ACL.

[38]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[39]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[40]  Farah Benamara,et al.  Towards Context-Based Subjectivity Analysis , 2011, IJCNLP.

[41]  Long Jiang,et al.  User-level sentiment analysis incorporating social networks , 2011, KDD.

[42]  Ohad Shamir,et al.  Better Mini-Batch Algorithms via Accelerated Gradient Methods , 2011, NIPS.

[43]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[44]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[45]  Patricio Martínez-Barco,et al.  Subjectivity and sentiment analysis: An overview of the current state of the area and envisaged developments , 2012, Decis. Support Syst..

[46]  Amit P. Sheth,et al.  Harnessing Twitter "Big Data" for Automatic Emotion Identification , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[47]  Stuart Adam Battersby,et al.  Experimenting with Distant Supervision for Emotion Classification , 2012, EACL.

[48]  Tuija Virtanen,et al.  Pragmatics of Computer-Mediated Communication , 2013 .

[49]  Markus Bieswanger,et al.  19. Micro-linguistic structural features of computer-mediated communication , 2013 .

[50]  Kareem Darwish,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs , 2013, WASSA@NAACL-HLT.

[51]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[52]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[53]  Amir F. Atiya,et al.  LABR: A Large Scale Arabic Book Reviews Dataset , 2013, ACL.

[54]  Hod Lipson,et al.  Re-embedding words , 2013, ACL.

[55]  David Yarowsky,et al.  Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media , 2013, EMNLP.

[56]  Vincent Ng,et al.  Extra-Linguistic Constraints on Stance Recognition in Ideological Debates , 2013, ACL.

[57]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[58]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[59]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[60]  Ming Zhou,et al.  Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach , 2014, COLING.

[61]  Claire Cardie,et al.  Deep Recursive Neural Networks for Compositionality in Language , 2014, NIPS.

[62]  Vincent Ng,et al.  Vote Prediction on Comments in Social Polls , 2014, EMNLP.

[63]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[64]  Muhammad Abdul-Mageed,et al.  SAMAR: Subjectivity and sentiment analysis for Arabic social media , 2014, Comput. Speech Lang..

[65]  Unspeakable Sentences (Routledge Revivals) : Narration and Representation in the Language of Fiction , 2014 .

[66]  Heng Ji,et al.  Exploring and inferring user–user pseudo‐friendship for sentiment analysis with heterogeneous networks , 2014, Stat. Anal. Data Min..

[67]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[68]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[69]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[70]  Samhaa R. El-Beltagy,et al.  Building Large Arabic Multi-domain Resources for Sentiment Analysis , 2015, CICLing.

[71]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[72]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[73]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[74]  Xuanjing Huang,et al.  Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents , 2015, EMNLP.

[75]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[76]  Svitlana Volkova,et al.  Inferring Latent User Properties from Texts Published in Social Media , 2015, AAAI.

[77]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[78]  Yue Zhang,et al.  Context-Sensitive Twitter Sentiment Classification Using Neural Network , 2016, AAAI.

[79]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[80]  Yue Zhang,et al.  Gated Neural Networks for Targeted Sentiment Analysis , 2016, AAAI.

[81]  Maite Taboada,et al.  Evaluative Language Beyond Bags of Words: Linguistic Insights and Computational Applications , 2017, CL.

[82]  Muhammad Abdul-Mageed Not All Segments are Created Equal: Syntactically Motivated Sentiment Analysis in Lexical Space , 2017, WANLP@EACL.

[83]  Muhammad Abdul-Mageed,et al.  You Tweet What You Speak: A City-Level Dataset of Arabic Dialects , 2018, LREC.

[84]  Muhammad Abdul-Mageed,et al.  Modeling Arabic subjectivity and sentiment in lexical space , 2017, Inf. Process. Manag..

[85]  V. Sharmila,et al.  Using Hashtags to Capture Fine Emotion Categories from Tweets , 2019 .