Social Media Analytics for Stance Mining A Multi-Modal Approach with Weak Supervision

People express their opinions on blogs and other social media platforms. As per a recent estimate, interactions on Twitter alone result in over 500 million tweets perday. The magnitude of this data enables new applications of opinion mining that have previously remained challenging e.g., finding users’ stance (as in pro or con) on topics of interest. However, one of the major barriers to utilizing this amount of data is the cost of hand-labeling examples for machine learning. This barrier is even more apparent in stance mining, as opinions can change overtime and can be about any issues. To reduce the need for hand-labeled data by taking the complex interactions of social media users and their social influence into account, this dissertationdevelops semi-supervised methods for stance mining.Most existing studies on stance mining take a simplistic view that assumes a sentence (like a Tweet) holds a perspective that is independent of the context and the author’s network position. This approach to stance learning leaves three crucial unresolved challenges. First, how do we train stance-learning models on new topics with minimal labeling effort? Discussion topics change fast and new issues emerge, making it difficult to reuse prior labeled data. However, artifacts of social networks like hashtags can give noisy signal about the stance of users. To extract the signal from noise, I develop methods to find useful hashtags by exploiting how users in the pro-group and the anti-group use popular hashtags. Second, how can we use multiple interaction modalities for stance mining? Users opinions are evident in different types of interactions, e.g. tweeting, retweeting or liking. I develop a semisupervised method based on co-training that jointly trains multiple stance classifiers using different interaction modalities resulting in a better stance prediction model. Third, how to leverage users networks for stance prediction? The current approachesto stance learning ignore important network factors such as the interactions of social media users (e.g., a persons preference can also be known from his friends preferences).I use the network alignment as one of the training signals to train the stance classifiers. My thesis brings a new direction to the stance learning problem that is grounded in social theory, is more amenable to analyzing activities on social media, and allows effective learning from multiple types of interactions without requiring large amounts of labeled data. By labeling only a few hashtags used in Twitter conversations on a few controversial topics, my approach allows for predicting both the stance of users (as in whether they are pro or con a topic) by over 80% accuracy andthe stance in conversations (as in whether they favor or deny others posts) by over 70% accuracy.

[1]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[2]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[3]  Claire Cardie,et al.  Major Life Event Extraction from Twitter based on Congratulations/Condolences Speech Acts , 2014, EMNLP.

[4]  Hideo Hirose,et al.  Prediction of Infectious Disease Spread Using Twitter: A Case of Influenza , 2012, 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming.

[5]  Munmun De Choudhury,et al.  Quote RTs on Twitter: usage of the new feature for political discourse , 2016, WebSci.

[6]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[7]  Wei Niu,et al.  BiasWatch: A Lightweight System for Discovering and Tracking Topic-Sensitive Opinion Bias in Social Media , 2015, CIKM.

[8]  Eugenio Tacchini,et al.  Some Like it Hoax: Automated Fake News Detection in Social Networks , 2017, ArXiv.

[9]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[10]  Laks V. S. Lakshmanan,et al.  Information and Influence Propagation in Social Networks , 2013, Synthesis Lectures on Data Management.

[11]  Maw-Sheng Chern,et al.  A Note on Approximation Schemes for Multidimensional Knapsack Problems , 1984, Math. Oper. Res..

[12]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[13]  He Jiang,et al.  Combating Fake News , 2019, ACM Trans. Intell. Syst. Technol..

[14]  Xiao Huang,et al.  Label Informed Attributed Network Embedding , 2017, WSDM.

[15]  Jürgen Schmidhuber,et al.  Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[16]  Joe Phua,et al.  Following Celebrities’ Tweets About Brands: The Impact of Twitter-Based Electronic Word-of-Mouth on Consumers’ Source Credibility Perception, Buying Intention, and Social Identification With Celebrities , 2014 .

[17]  Junghwan Kim,et al.  SIDE: Representation Learning in Signed Directed Networks , 2018, WWW.

[18]  Kathleen M. Carley,et al.  What to Track on the Twitter Streaming API? A Knapsack Bandits Approach to Dynamically Update the Search Terms , 2019, 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[19]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[20]  Aristides Gionis,et al.  Quantifying Controversy on Social Media , 2018, ACM Trans. Soc. Comput..

[21]  Peter Auer,et al.  UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..

[22]  Tom M. Mitchell,et al.  Estimating Accuracy from Unlabeled Data: A Bayesian Approach , 2016, ICML.

[23]  Kalina Bontcheva,et al.  Stance Detection with Bidirectional Conditional Encoding , 2016, EMNLP.

[24]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[25]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[26]  Marilyn A. Walker,et al.  Collective Stance Classification of Posts in Online Debate Forums , 2014 .

[27]  Andrew McCallum,et al.  Using Reinforcement Learning to Spider the Web Efficiently , 1999, ICML.

[28]  Rosane Minghim,et al.  Toward understanding how users respond to rumours in social media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[29]  ZhengBaihua,et al.  Should We Use the Sample? Analyzing Datasets Sampled from Twitters Stream API , 2015 .

[30]  Alina Campan,et al.  Is Data Collection through Twitter Streaming API Useful for Academic Research? , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[31]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[32]  Francesco Bonchi,et al.  Influence Propagation in Social Networks: A Data Mining Perspective , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[33]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[34]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[35]  Dan Goldwasser,et al.  Identifying Stance by Analyzing Political Discourse on Twitter , 2016, NLP+CSS@EMNLP.

[36]  Victoria L. Rubin,et al.  Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News , 2016 .

[37]  Rada Mihalcea,et al.  Towards multimodal sentiment analysis: harvesting opinions from the web , 2011, ICMI '11.

[38]  R. Procter,et al.  Reading the riots on Twitter: methodological innovation for the analysis of big data , 2013 .

[39]  Jure Leskovec,et al.  Exploiting Social Network Structure for Person-to-Person Sentiment Analysis , 2014, TACL.

[40]  Wei-Ying Ma,et al.  Hashtag-Based Sub-Event Discovery Using Mutually Generative LDA in Twitter , 2016, AAAI.

[41]  Yue Chen,et al.  IUCL at SemEval-2016 Task 6: An Ensemble Model for Stance Detection in Twitter , 2016, *SEMEVAL.

[42]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[43]  Sumeet Kumar,et al.  Diffusion of pro- and anti-false information tweets: the Black Panther movie case , 2018, Comput. Math. Organ. Theory.

[44]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[45]  Peiliang Xu Truncated SVD methods for discrete linear ill-posed problems , 1998 .

[46]  Muhammad Imran,et al.  Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages , 2016, LREC.

[47]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[48]  Yongdong Zhang,et al.  News Verification by Exploiting Conflicting Social Viewpoints in Microblogs , 2016, AAAI.

[49]  Kathleen M. Carley,et al.  People2Vec: Learning Latent Representations of Users Using Their Social-Media Activities , 2018, SBP-BRiMS.

[50]  Kathleen M. Carley,et al.  Tree LSTMs with Convolution Units to Predict Stance and Rumor Veracity in Social Media Conversations , 2019, ACL.

[51]  Eric Gilbert,et al.  Blogs are Echo Chambers: Blogs are Echo Chambers , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[52]  Vincent Ng,et al.  Extra-Linguistic Constraints on Stance Recognition in Ideological Debates , 2013, ACL.

[53]  Ghazaleh Beigi,et al.  Signed Link Analysis in Social Media Networks , 2016, ICWSM.

[54]  Paolo Toth,et al.  Dynamic programming algorithms for the Zero-One Knapsack Problem , 1980, Computing.

[55]  Jun Du,et al.  When Does Cotraining Work in Real Data? , 2011, IEEE Transactions on Knowledge and Data Engineering.

[56]  Erin Brady,et al.  A Co-Training Model with Label Propagation on a Bipartite Graph to Identify Online Users with Disabilities , 2019, ICWSM.

[57]  Mung Chiang,et al.  Quantifying Political Leaning from Tweets, Retweets, and Retweeters , 2016, IEEE Transactions on Knowledge and Data Engineering.

[58]  Jacob Ratkiewicz,et al.  Predicting the Political Alignment of Twitter Users , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[59]  Daniel Gildea Dependencies vs. Constituents for Tree-Based Alignment , 2004, EMNLP.

[60]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[61]  Saif Mohammad,et al.  Stance and Sentiment in Tweets , 2016, ACM Trans. Internet Techn..

[62]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[63]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[64]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[65]  Arkaitz Zubiaga,et al.  All-in-one: Multi-task Learning for Rumour Verification , 2018, COLING.

[66]  Wei Gao,et al.  Rumor Detection on Twitter with Tree-structured Recursive Neural Networks , 2018, ACL.

[67]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[68]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[69]  Eunsol Choi,et al.  Document-level Sentiment Inference with Social, Faction, and Discourse Context , 2016, ACL.

[70]  John W. Du Bois The stance triangle , 2007 .

[71]  David Lazer,et al.  ConStance: Modeling Annotation Contexts to Improve Stance Classification , 2017, EMNLP.

[72]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[73]  Svitlana Volkova,et al.  Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter , 2017, ACL.

[74]  Arkaitz Zubiaga,et al.  Discourse-aware rumour stance classification in social media using sequential classifiers , 2017, Inf. Process. Manag..

[75]  Samhaa R. El-Beltagy,et al.  NileTMRG at SemEval-2017 Task 8: Determining Rumour and Veracity Support for Rumours on Twitter. , 2017, *SEMEVAL.

[76]  Xiao Zhang,et al.  pkudblab at SemEval-2016 Task 6 : A Specific Convolutional Neural Network System for Effective Stance Detection , 2016, *SEMEVAL.

[77]  Cody Buntain,et al.  Automatically Identifying Fake News in Popular Twitter Threads , 2017, 2017 IEEE International Conference on Smart Cloud (SmartCloud).

[78]  Aristides Gionis,et al.  Political Discourse on Social Media: Echo Chambers, Gatekeepers, and the Price of Bipartisanship , 2018, WWW.

[79]  L. Akoglu Quantifying Political Polarity Based on Bipartite Opinion Networks , 2014, ICWSM.

[80]  Aristides Gionis,et al.  Joint Non-negative Matrix Factorization for Learning Ideological Leaning on Twi er , 2017 .

[81]  Vincent Ng,et al.  Stance Classification of Ideological Debates: Data, Models, Features, and Constraints , 2013, IJCNLP.

[82]  Jacob Eisenstein,et al.  Interactional Stancetaking in Online Forums , 2018, CL.

[83]  Sinan Saraçli,et al.  Comparison of hierarchical cluster analysis methods by cophenetic correlation , 2013, Journal of Inequalities and Applications.

[84]  Giuseppe Sansonetti,et al.  iSCUR: Interest and Sentiment-Based Community Detection for User Recommendation on Twitter , 2014, UMAP.

[85]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[86]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[87]  Daniel Marcu,et al.  Binarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy , 2007, EMNLP.

[88]  Swapna Somasundaran,et al.  Recognizing Stances in Ideological On-Line Debates , 2010, HLT-NAACL 2010.

[89]  Kathleen M. Carley,et al.  The evolution of political memes: Detecting and characterizing internet memes with multi-modal deep learning , 2020, Inf. Process. Manag..

[90]  Wray L. Buntine,et al.  Topic Model : Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon , 2014 .

[91]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[93]  Arkaitz Zubiaga,et al.  Hawkes Processes for Continuous Time Sequence Classification: an Application to Rumour Stance Classification in Twitter , 2016, ACL.

[94]  Chengkai Li,et al.  Detecting Check-worthy Factual Claims in Presidential Debates , 2015, CIKM.

[95]  Victoria L. Rubin,et al.  Truth and deception at the rhetorical structure level , 2015, J. Assoc. Inf. Sci. Technol..

[96]  Diana Inkpen,et al.  A Dataset for Multi-Target Stance Detection , 2017, EACL.

[97]  Hasan Davulcu,et al.  Community detection in political Twitter networks using Nonnegative Matrix Factorization methods , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[98]  Helen Margetts,et al.  Political behaviour and the acoustics of social media , 2017, Nature Human Behaviour.

[99]  Soroush Vosoughi,et al.  Me, My Echo Chamber, and I: Introspection on Social Media Polarization , 2018, WWW.

[100]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[101]  Charu C. Aggarwal,et al.  Signed Network Embedding in Social Media , 2017, SDM.

[102]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[103]  Amita Misra,et al.  NLDS-UCSC at SemEval-2016 Task 6: A Semi-Supervised Approach to Detecting Stance in Tweets , 2016, *SEMEVAL.

[104]  Emilio Ferrara,et al.  Manipulation and Abuse on Social Media , 2015, ArXiv.

[105]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[106]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[107]  Ido Dagan,et al.  Synthesis Lectures on Human Language Technologies , 2009 .

[108]  A. Arvidsson,et al.  Echo Chamber or Public Sphere? Predicting Political Orientation and Measuring Political Homophily in Twitter Using Big Data , 2014 .

[109]  Kathleen M. Carley,et al.  Approaches to understanding the motivations behind cyber attacks , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[110]  Arkaitz Zubiaga,et al.  Stance Classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations , 2016, COLING.

[111]  Kalina Bontcheva,et al.  Classifying Tweet Level Judgements of Rumours in Social Media , 2015, EMNLP.

[112]  Zhiyuan Liu,et al.  CANE: Context-Aware Network Embedding for Relation Modeling , 2017, ACL.

[113]  Jacques Teghem,et al.  Two-phases Method and Branch and Bound Procedures to Solve the Bi–objective Knapsack Problem , 1998, J. Glob. Optim..

[114]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[115]  Kalina Bontcheva,et al.  USFD at SemEval-2016 Task 6: Any-Target Stance Detection on Twitter with Autoencoders , 2016, *SEMEVAL.

[116]  Ron Kohavi,et al.  Online controlled experiments at large scale , 2013, KDD.

[117]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[118]  N. Newman,et al.  Identifying and Verifying News through Social Media , 2014 .

[119]  Ash Evans Stance and identity in Twitter hashtags , 2016 .

[120]  Zhiying Xin,et al.  Stancetaking in Discourse: Subjectivity, Evaluation, Interaction , 2008 .

[121]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[122]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[123]  Stan Matwin,et al.  From Argumentation Mining to Stance Classification , 2015, ArgMining@HLT-NAACL.

[124]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[125]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[126]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[127]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[128]  Arkaitz Zubiaga,et al.  Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads , 2015, PloS one.

[129]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[130]  Archie C. Chapman,et al.  Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.

[131]  Eduard H. Hovy,et al.  When Are Tree Structures Necessary for Deep Learning of Representations? , 2015, EMNLP.

[132]  Akemi Takeoka Chatfield,et al.  Twitter Early Tsunami Warning System: A Case Study in Indonesia's Natural Disaster Management , 2013, 2013 46th Hawaii International Conference on System Sciences.

[133]  Padmini Srinivasan,et al.  GOP primary season on twitter: "popular" political sentiment in social media , 2013, WSDM.

[134]  Arkaitz Zubiaga,et al.  Crowdsourcing the Annotation of Rumourous Conversations in Social Media , 2015, WWW.

[135]  Binyang Li,et al.  Early Rumour Detection , 2019, NAACL.