Users Are Known by the Company They Keep: Topic Models for Viewpoint Discovery in Social Networks

Social media platforms such as weblogs and social networking sites provide Internet users with an unprecedented means to express their opinions and debate on a wide range of issues. Concurrently with their growing importance in public communication, social media platforms may foster echo chambers and filter bubbles: homophily and content personalization lead users to be increasingly exposed to conforming opinions. There is therefore a need for unbiased systems able to identify and provide access to varied viewpoints. To address this task, we propose in this paper a novel unsupervised topic model, the Social Network Viewpoint Discovery Model (SNVDM). Given a specific issue (e.g., U.S. policy) as well as the text and social interactions from the users discussing this issue on a social networking site, SNVDM jointly identifies the issue's topics, the users' viewpoints, and the discourse pertaining to the different topics and viewpoints. In order to overcome the potential sparsity of the social network (i.e., some users interact with only a few other users), we propose an extension to SNVDM based on the Generalized Pólya Urn sampling scheme (SNVDM-GPU) to leverage "acquaintances of acquaintances" relationships. We benchmark the different proposed models against three baselines, namely TAM, SN-LDA, and VODUM, on a viewpoint clustering task using two real-world datasets. We thereby provide evidence that our model SNVDM and its extension SNVDM-GPU significantly outperform state-of-the-art baselines, and we show that utilizing social interactions greatly improves viewpoint clustering performance.

[1]  A. Gionis,et al.  antifying Controversy on Social Media , 2018 .

[2]  Justin M. Rao,et al.  Filter Bubbles, Echo Chambers, and Online News Consumption , 2016 .

[3]  Jacob Ratkiewicz,et al.  Predicting the Political Alignment of Twitter Users , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[4]  Liu Yang,et al.  Modeling interaction features for debate side clustering , 2013, CIKM.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Derek Ruths,et al.  Classifying Political Orientation on Twitter: It's Not Easy! , 2013, ICWSM.

[7]  Philip Resnik,et al.  Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress , 2015, ACL.

[8]  Georgina Kennedy,et al.  Characterizing Twitter Discussions About HPV Vaccines Using Topic Modeling and Community Detection , 2016, Journal of medical Internet research.

[9]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[10]  Craig MacDonald,et al.  Topic-centric Classification of Twitter User's Political Orientation , 2015, FDIA.

[11]  Ricardo Baeza-Yates,et al.  Finding Intermediary Topics Between People of Opposing Views: A Case Study , 2015, SPS@SIGIR.

[12]  David M. Mimno,et al.  Comparing Apples to Apple: The Effects of Stemmers on Topic Models , 2016, TACL.

[13]  Wendy Liu,et al.  Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors , 2012, ICWSM.

[14]  Aaron Smith,et al.  Cell Phones, Social Media and Campaign 2014 , 2014 .

[15]  Ana-Maria Popescu,et al.  Detecting controversial events from twitter , 2010, CIKM.

[16]  Andreas Jungherr Twitter use in election campaigns: A systematic literature review , 2016 .

[17]  Thomas J. Johnson,et al.  The Revolution Will be Networked , 2010 .

[18]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[19]  Minghui Qiu,et al.  Mining User Viewpoints in Online Discussions , 2015 .

[20]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[21]  Aristides Gionis,et al.  Balancing Opposing Views to Reduce Controversy , 2016, ArXiv.

[22]  Jing Jiang,et al.  A Latent Variable Model for Viewpoint Discovery from Threaded Forum Posts , 2013, NAACL.

[23]  Aristides Gionis,et al.  Quantifying Controversy in Social Media , 2015, WSDM.

[24]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[25]  Noah A. Smith,et al.  Modeling User Arguments, Interactions, and Attributes for Stance Prediction in Online Debate Forums , 2015, SDM.

[26]  Ana-Maria Popescu,et al.  Democrats, republicans and starbucks afficionados: user classification in twitter , 2011, KDD.

[27]  Philip Resnik,et al.  Modeling Perspective Using Adaptor Grammars , 2010, EMNLP.

[28]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[29]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[30]  Bing Liu,et al.  Mining Aspect-Specific Opinion using a Holistic Lifelong Topic Model , 2016, WWW.

[31]  Pushpak Bhattacharyya,et al.  Political Issue Extraction Model: A Novel Hierarchical Topic Model That Uses Tweets By Political And Non-Political Authors , 2016, WASSA@NAACL-HLT.

[32]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[33]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[34]  Bing Liu,et al.  Mining topics in documents: standing on the shoulders of big data , 2014, KDD.

[35]  Philip Resnik,et al.  Political Ideology Detection Using Recursive Neural Networks , 2014, ACL.

[36]  Pablo Barberá Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data , 2015, Political Analysis.

[37]  Eli Pariser FILTER BUBBLE: Wie wir im Internet entmündigt werden , 2012 .

[38]  Michael J. Paul,et al.  A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics , 2010, AAAI.

[39]  Hosam M. Mahmoud,et al.  Polya Urn Models , 2008 .

[40]  Eric P. Xing,et al.  Spatial compactness meets topical consistency: jointly modeling links and content for community detection , 2014, WSDM.

[41]  Michael J. Paul,et al.  Summarizing Contrastive Viewpoints in Opinionated Text , 2010, EMNLP.

[42]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[43]  Eli Pariser,et al.  The Filter Bubble: What the Internet Is Hiding from You , 2011 .

[44]  Timothy Baldwin,et al.  #ISISisNotIslam or #DeportAllMuslims?: predicting unspoken views , 2016, WebSci.

[45]  Igor Brigadir,et al.  Analyzing Discourse Communities with Distributional Semantic Models , 2015, WebSci.

[46]  Nemanja Spasojevic,et al.  Actionable and Political Text Classification using Word Embeddings and LSTM , 2016, ArXiv.

[47]  Aixin Sun,et al.  Topic Modeling for Short Texts with Auxiliary Word Embeddings , 2016, SIGIR.

[48]  C. Sunstein Republic.com 2.0 , 2007 .

[49]  Lora Aroyo,et al.  Time-aware Multi-Viewpoint Summarization of Multilingual Social Text Streams , 2016, CIKM.

[50]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[51]  Shiri Dori-Hacohen,et al.  Automated Controversy Detection on the Web , 2015, ECIR.

[52]  K. Mandl,et al.  Associations Between Exposure to and Expression of Negative Opinions About Human Papillomavirus Vaccines on Social Media: An Observational Study , 2015, Journal of medical Internet research.

[53]  R. Kelly Garrett,et al.  Partisan Paths to Exposure Diversity: Differences in Pro‐ and Counterattitudinal News Consumption , 2014 .

[54]  Mohand Boughanem,et al.  VODUM: A Topic Model Unifying Viewpoint, Topic and Opinion Discovery , 2016, ECIR.

[55]  Lada A. Adamic,et al.  Exposure to ideologically diverse news and opinion on Facebook , 2015, Science.