Modelling Online Comment Threads from their Start

The social Web is a widely used platform for online discussion. Across social media, users can start discussions by posting a topical image, url, or message. Upon seeing this initial post, other users may add their own comments to the post, or to another user’s comment. The resulting online discourse produces a comment thread, which constitutes an enormous portion of modern online communication. Comment threads are often viewed as trees: nodes represent the post and its comments, while directed edges represent reply-to relationships. The goal of the present work is to predict the size and shape of these comment threads. Existing models do this by observing the first several comments and then fitting a predictive model. However, most comment threads are relatively small, and waiting for data to materialize runs counter to the goal of the prediction task. We therefore introduce the Comment Thread Prediction Model (CTPM) that accurately predicts the size and shape of a comment thread using only the text of the initial post, allowing for the prediction of new posts without observable comments. We find that the CTPM significantly outperforms existing models and competitive baselines on thousands of Reddit discussions from nine varied subreddits, particularly for new posts.

[1]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[2]  Duncan J. Watts,et al.  The Structural Virality of Online Diffusion , 2015, Manag. Sci..

[3]  Ji-Rong Wen,et al.  Predicting the Popularity of Online Content with Knowledge-enhanced Neural Networks , 2018 .

[4]  Jean-Charles Delvenne,et al.  Modelling structure and predicting dynamics of discussion threads in online boards , 2018, J. Complex Networks.

[5]  Rediet Abebe Can Cascades be Predicted? , 2014 .

[6]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Combinatorial Action Space for Predicting and Tracking Popular Discussion Threads , 2016, ArXiv.

[7]  Daniel G. Goldstein,et al.  The structure of online diffusion networks , 2012, EC '12.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads , 2016, EMNLP.

[10]  Shuai Gao,et al.  Modeling and Predicting Retweeting Dynamics on Microblogging Platforms , 2015, WSDM.

[11]  Lawrence Ang,et al.  Self-branding, ‘micro-celebrity’ and the rise of Social Media Influencers , 2017 .

[12]  Jure Leskovec,et al.  What's in a Name? Understanding the Interplay between Titles, Content, and Communities in Social Media , 2013, ICWSM.

[13]  Tim Weninger,et al.  GuessTheKarma: A Game to Assess Social Rating Systems , 2018, Proc. ACM Hum. Comput. Interact..

[14]  Renaud Lambiotte,et al.  TiDeH: Time-Dependent Hawkes Process for Predicting Retweet Dynamics , 2016, ICWSM.

[15]  Jian Li,et al.  Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec , 2017, WSDM.

[16]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[17]  Swapnil Mishra,et al.  A Tutorial on Hawkes Processes for Events in Social Media , 2017, ArXiv.

[18]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[19]  Matthew J. Salganik,et al.  Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market , 2006, Science.

[20]  Le Song,et al.  Learning Triggering Kernels for Multi-dimensional Hawkes Processes , 2013, ICML.

[21]  Tim Weninger,et al.  Rating Effects on Social News Posts and Comments , 2016, ACM Trans. Intell. Syst. Technol..

[22]  Scott Counts,et al.  Predicting the Speed, Scale, and Range of Information Diffusion in Twitter , 2010, ICWSM.

[23]  Esko Valkeila,et al.  An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure, 2nd Edition by Daryl J. Daley, David Vere‐Jones , 2008 .

[24]  Ted Taekyoung Kwon,et al.  Characterizing Conversation Patterns in Reddit: From the Perspectives of Content Properties and User Participation Behaviors , 2015, COSN.

[25]  Jure Leskovec,et al.  SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity , 2015, KDD.

[26]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[27]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.