Mapping the public agenda with topic modeling: The case of the Russian livejournal

This article describes agendas as “packages” of topics of varying salience, set by the Russian Internet users on Russia’s leading blog platform LiveJournal. The research involved modeling LiveJournal’s topic structure, viewed as an important component of what is termed here selfgenerated public opinion. Topic modeling was performed automatically with the LDA algorithm, and complemented with hand labeling of topics. Data were collected by software created by the authors to generate a relational database storing all posts by the top 2,000 LiveJournal users from three one-month periods: two during the Russian parliamentary and presidential elections 2011–2012, and one control period. We find that LiveJournal top users share their attention evenly between “social/political” and “private/recreational” issues, the proportion being very stable. However, the substitution of diverse public affairs issues by the topics related to national street protests in the politicized periods compared to the control period was found both automatically and manually. The group of topics centered around social issues demonstrates the biggest volatility in terms of its composition and may serve as the foundation for monitoring self-generated public opinion by further application of sentiment/opinion mining methods to these topics.

[1]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[2]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[3]  Edward A. Fox,et al.  Recent Developments in Document Clustering , 2007 .

[4]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[5]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[6]  Kathleen M. Carley,et al.  A Methodology for Integrating Network Theory and Topic Modeling and its Application to Innovation Diffusion , 2010, 2010 IEEE Second International Conference on Social Computing.

[7]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[8]  Zizi Papacharissi,et al.  Audiences as Media Producers: Content Analysis of 260 Blogs , 2012 .

[9]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[10]  Lipika Dey,et al.  Opinion mining from noisy text data , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[11]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[12]  Patricia Moy,et al.  Parsing Framing Processes: The Interplay Between Online Public Opinion and Media Coverage , 2007 .

[13]  Walter R. Mebane,et al.  Comparative Election Fraud Detection , 2009 .

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[16]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[17]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[18]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[19]  Thomas Hegghammer Should I Stay or Should I Go? Explaining Variation in Western Jihadists' Choice between Domestic and Foreign Fighting , 2013, American Political Science Review.

[20]  Dhavan V. Shah,et al.  Agenda Setting in a Digital Age: Tracking Attention to California Proposition 8 in Social Media, Online News and Conventional News , 2010 .

[21]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[22]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[23]  L. Lyons,et al.  The Press and Foreign Policy , 1964 .

[24]  Shi Zhong,et al.  Efficient online spherical k-means clustering , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[25]  Beibei Li,et al.  Enhancing clustering blog documents by utilizing author/reader comments , 2007, ACM-SE 45.

[26]  P. Lazarsfeld,et al.  The people's choice. , 1945 .

[27]  Michael J. Towle The Illusion of Public Opinion: Fact and Artifact in American Public Opinion Polls , 2005 .

[28]  Juan-Zi Li,et al.  Knowledge discovery through directed probabilistic topic models: a survey , 2010, Frontiers of Computer Science in China.

[29]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[31]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[32]  atherine,et al.  Finding the number of clusters in a data set : An information theoretic approach C , 2003 .

[33]  Christopher D. Manning,et al.  Topic Modeling for the Social Sciences , 2009 .

[34]  Paolo Rosso,et al.  Characterizing Weblog Corpora , 2009, NLDB.

[35]  Alan D. Monroe Public opinion in America , 1975 .

[36]  Olessia Koltsova,et al.  Changes in the Topical Structure of Russian-Language LiveJournal: The Impact of Elections 2011 , 2013 .

[37]  Ruben Enikolopov,et al.  Field experiment estimate of electoral fraud in Russian parliamentary elections , 2012, Proceedings of the National Academy of Sciences.

[38]  J. F. Scott The Press and Foreign Policy , 1931, The Journal of Modern History.

[39]  Herbert Blumer,et al.  Public Opinion and Public Opinion Polling , 1948 .

[40]  Margaret E. Roberts,et al.  How Censorship in China Allows Government Criticism but Silences Collective Expression , 2013, American Political Science Review.

[41]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[42]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[43]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[44]  George Karypis,et al.  gCLUTO – An Interactive Clustering, Visualization, and Analysis System , 2004 .

[45]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .

[46]  D. D. Droba Methods Used for Measuring Public Opinion , 1931, American Journal of Sociology.

[47]  Kevin Wallsten Agenda Setting and the Blogosphere: An Analysis of the Relationship between Mainstream Media and Political Blogs , 2007 .

[48]  Mark Liberman,et al.  Computational approaches to analyzing weblogs : papers from the AAAI Spring Symposium , 2006 .

[49]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[50]  Robert Faris,et al.  Public Discourse in the Russian Blogosphere: Mapping RuNet Politics and Mobilization , 2010 .

[51]  Andreas Kaltenbrunner,et al.  Emotions, Public Opinion and U.S. Presidential Approval Rates: A 5 Year Analysis of Online Political Discussions , 2011 .

[52]  Duncan J. Watts,et al.  Who says what to whom on twitter , 2011, WWW.

[53]  Xiaoyan Zhu,et al.  Sentiment Analysis with Global Topics and Local Dependency , 2010, AAAI.

[54]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[55]  D. Shaw,et al.  Agenda setting function of mass media , 1972 .

[56]  Robert S. Erikson,et al.  American Public Opinion: Its Origins, Content, and Impact. , 2019 .