From bias to opinion: a transfer-learning approach to real-time sentiment analysis

Real-time interaction, which enables live discussions, has become a key feature of most Web applications. In such an environment, the ability to automatically analyze user opinions and sentiments as discussions develop is a powerful resource known as real time sentiment analysis. However, this task comes with several challenges, including the need to deal with highly dynamic textual content that is characterized by changes in vocabulary and its subjective meaning and the lack of labeled data needed to support supervised classifiers. In this paper, we propose a transfer learning strategy to perform real time sentiment analysis. We identify a task - opinion holder bias prediction - which is strongly related to the sentiment analysis task; however, in constrast to sentiment analysis, it builds accurate models since the underlying relational data follows a stationary distribution. Instead of learning textual models to predict content polarity (i.e., the traditional sentiment analysis approach), we first measure the bias of social media users toward a topic, by solving a relational learning task over a network of users connected by endorsements (e.g., retweets in Twitter). We then analyze sentiments by transferring user biases to textual features. This approach works because while new terms may arise and old terms may change their meaning, user bias tends to be more consistent over time as a basic property of human behavior. Thus, we adopted user bias as the basis for building accurate classification models. We applied our model to posts collected from Twitter on two topics: the 2010 Brazilian Presidential Elections and the 2010 season of Brazilian Soccer League. Our results show that knowing the bias of only 10% of users generates an F1 accuracy level ranging from 80% to 90% in predicting user sentiment in tweets.

[1]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[2]  Jennifer Neville,et al.  Modeling relationship strength in online social networks , 2010, WWW '10.

[3]  Fabio Crestani,et al.  Proximity-based opinion retrieval , 2010, SIGIR '10.

[4]  Douglas Walton,et al.  Bias, Critical Doubt, and Fallacies , 1991 .

[5]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[6]  Alan F. Smeaton,et al.  Classifying sentiment in microblogs: is brevity an advantage? , 2010, CIKM.

[7]  Christos Faloutsos,et al.  Using ghost edges for classification in sparsely labeled networks , 2008, KDD.

[8]  Foster Provost,et al.  A Simple Relational Classifier , 2003 .

[9]  Walther Kindt,et al.  On the problem of bias in political argumentation: An investigation into discussions about political asylum in Germany and Austria , 1997 .

[10]  Tim Groseclose,et al.  A Measure of Media Bias , 2005 .

[11]  Chris H. Q. Ding,et al.  Bridging Domains with Words: Opinion Analysis with Matrix Tri-factorizations , 2010, SDM.

[12]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[13]  William W. Cohen,et al.  Semi-Supervised Classification of Network Data Using Very Few Labels , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[14]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[17]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[18]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[19]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[20]  Michael Gamon,et al.  BLEWS: Using Blogs to Provide Context for News Articles , 2008, ICWSM.

[21]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[22]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[23]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[24]  Rohini K. Srihari,et al.  OpinionMiner: a novel machine learning system for web opinion mining and extraction , 2009, KDD.

[25]  Alan F. Smeaton,et al.  Crowdsourced real-world sensing: sentiment analysis and the real-time web , 2010 .

[26]  J. D. McCarthy,et al.  The use of newspaper data in the study of collective action , 2003 .

[27]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[28]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[29]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[30]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[31]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.