Sentiment in short strength detection informal text

A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6p accuracy and negative emotion with 72.8p accuracy, both based upon strength scales of 1–5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches. © 2010 Wiley Periodicals, Inc.

[1]  Antonio Zamora,et al.  Automatic spelling correction in scientific and scholarly text , 1984, CACM.

[2]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[3]  J. Stoppard,et al.  Gender, Context, and Expression of Positive Emotion , 1993 .

[4]  Daantje Derks,et al.  The role of emotion in computer-mediated communication: A review , 2008, Comput. Hum. Behav..

[5]  Songbo Tan,et al.  A survey on sentiment detection of reviews , 2009, Expert Syst. Appl..

[6]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[7]  Michael D. Robinson,et al.  Measures of emotion: A review , 2009, Cognition & emotion.

[8]  Michael Gamon,et al.  Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis , 2004, COLING.

[9]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[10]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[11]  D. Watson,et al.  Development and validation of brief measures of positive and negative affect: the PANAS scales. , 1988, Journal of personality and social psychology.

[12]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[13]  L. F. Barrett Valence is a basic building block of emotional life , 2006 .

[14]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[15]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[16]  Hsinchun Chen,et al.  Affect Analysis of Web Forums and Blogs Using Correlation Ensembles , 2008, IEEE Transactions on Knowledge and Data Engineering.

[17]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[18]  Darren Gergle,et al.  Emotion rating from short blog texts , 2008, CHI.

[19]  J. Pennebaker,et al.  Psychological aspects of natural language. use: our words, our selves. , 2003, Annual review of psychology.

[20]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[21]  Masaru Kitsuregawa,et al.  Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents , 2007, EMNLP.

[22]  D. Boyd Why Youth (Heart) Social Network Sites: The Role of Networked Publics in Teenage Social Life , 2007 .

[23]  Jenefer Robinson A Sentimental Education , 2005 .

[24]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[25]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[26]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[27]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[28]  Theresa Wilson Fine-grained subjectivity and sentiment analysis: recognizing the intensity, polarity, and attitudes of private states , 2008 .

[29]  Siddharth Patwardhan,et al.  Feature Subsumption for Opinion Analysis , 2006, EMNLP.

[30]  J. Russell Affective space is bipolar. , 1979 .

[31]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[32]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[33]  BONNIE A. NARDI,et al.  Beyond Bandwidth: Dimensions of Connection in Interpersonal Communication , 2005, Computer Supported Cooperative Work (CSCW).

[34]  Chern Li Liew,et al.  Hunting Suicide Notes in Web 2.0 - Preliminary Findings , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[35]  Mitsuru Ishizuka,et al.  Textual Affect Sensing for Sociable and Expressive Online Communication , 2007, ACII.

[36]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[37]  Daantje Derks,et al.  Emoticons and Online Message Interpretation , 2008 .

[38]  Regina Barzilay,et al.  Multiple Aspect Ranking Using the Good Grief Algorithm , 2007, NAACL.

[39]  Wolfgang Nejdl,et al.  How valuable is medical social media data? Content analysis of the medical web , 2009, Inf. Sci..

[40]  Jeffrey T. Hancock,et al.  I'm sad you're sad: emotional contagion in CMC , 2008, CSCW.

[41]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[42]  Rudy Prabowo,et al.  Sentiment analysis: A combined approach , 2009, J. Informetrics.

[43]  Chung-Hsien Wu,et al.  Emotion recognition from text using semantic labels and separable mixture models , 2006, TALIP.

[44]  Vincent Ng,et al.  Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews , 2006, ACL.

[45]  P. Ekman An argument for basic emotions , 1992 .

[46]  Claire Cardie,et al.  Learning with Compositional Semantics as Structural Inference for Subsentential Sentiment Analysis , 2008, EMNLP.

[47]  Ana M. García-Serrano,et al.  Q-WordNet: Extracting Polarity from WordNet Senses , 2010, LREC.

[48]  Martha E. Francis,et al.  Journal of Personality and Social Psychology Linguistic Predictors of Adaptive Bereavement , 2022 .

[49]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[50]  Zornitsa Kozareva,et al.  Determining the Polarity and Source of Opinions Expressed in Political Debates , 2009, CICLing.

[51]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[52]  Jeremy C. Short,et al.  The Application of DICTION to Content Analysis Research in Strategic Management , 2008 .

[53]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[54]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[55]  Shlomo Argamon,et al.  Stylistic text classification using functional lexical features , 2007, J. Assoc. Inf. Sci. Technol..

[56]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[57]  M. Thelwall,et al.  Data mining emotion in social network communication: Gender differences in MySpace , 2010 .

[58]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[59]  Felicia A Huppert,et al.  Evidence for the independence of positive and negative well-being: implications for quality of life assessment. , 2003, British journal of health psychology.

[60]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[61]  Gilad Mishne,et al.  Capturing Global Mood Levels using Blog Posts , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[62]  D. Boyd Taken Out of Context: American Teen Sociality in Networked Publics , 2010 .

[63]  François-Régis Chaumartin,et al.  UPAR7: A knowledge-based system for headline sentiment tagging , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[64]  E. Diener,et al.  The independence of positive and negative affect. , 1984, Journal of personality and social psychology.

[65]  Janyce Wiebe,et al.  RECOGNIZING STRONG AND WEAK OPINION CLAUSES , 2006, Comput. Intell..

[66]  Rebecca E. Grinter,et al.  Wan2tlk?: everyday text messaging , 2003, CHI '03.

[67]  Carlo Strapparava,et al.  Learning to identify emotions in text , 2008, SAC '08.

[68]  D. Watson Intraindividual and interindividual analyses of positive and negative affect: their relation to health complaints, perceived stress, and daily activities. , 1988, Journal of personality and social psychology.