What Ignites a Reply?: Characterizing Conversations in Microblogs

Nowadays, microblog platforms provide a medium to share content and interact with other users. With the large-scale data generated on these platforms, the origin and reasons of users engagement in conversations has attracted the attention of the research community. In this paper, we analyze the factors that might spark conversations in Twitter, for the English and Spanish languages. Using a corpus of 2.7 million tweets, we reconstruct existing conversations, then extract several contextual and content features. Based on the features extracted, we train and evaluate several predictive models to identify tweets that will spark a conversation. Our findings show that conversations are more likely to be initiated by users with high activity level and popularity. For less popular users, the type of content generated is a more important factor. Experimental results shows that the best predictive model is able obtain an average score $F1=0.80$. We made available the dataset scripts and code used in this paper to the research community via Github.

[1]  S. Ye Measuring message propagation and social influence on Twitter , 2013 .

[2]  An Gie Yong,et al.  A Beginner's Guide to Factor Analysis: Focusing on Exploratory Factor Analysis , 2013 .

[3]  Susan C. Herring,et al.  Beyond Microblogging: Conversation and Collaboration via Twitter , 2009 .

[4]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[5]  Min-Yen Kan Optimizing predictive text entry for short message service on mobile phones 1 , 2005 .

[6]  Jalal Mahmud,et al.  "How May I Help You?": Modeling Twitter Customer ServiceConversations Using Fine-Grained Dialogue Acts , 2017, IUI.

[7]  Luis A. Guerrero,et al.  Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces , 2017 .

[8]  A. Bruns,et al.  Twitter and Society , 2013 .

[9]  Brian D. Davison,et al.  Predicting popular messages in Twitter , 2011, WWW.

[10]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[11]  Danah Boyd,et al.  Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[12]  Jorge Bernardino,et al.  NoSQL databases: MongoDB vs cassandra , 2013, C3S2E '13.

[13]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads , 2016, EMNLP.

[14]  Micha Elsner,et al.  You Talking to Me? A Corpus and Algorithm for Conversation Disentanglement , 2008, ACL.

[15]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[16]  Shyhtsun Felix Wu,et al.  Measuring message propagation and social influence on Twitter.com , 2010, Int. J. Commun. Networks Distributed Syst..

[17]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[18]  Susan C. Herring,et al.  Beyond Microblogging: Conversation and Collaboration via Twitter , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[19]  Nathanael Chambers,et al.  PLOW: A Collaborative Task Learning Agent , 2007, AAAI.

[20]  Ed H. Chi,et al.  Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network , 2010, 2010 IEEE Second International Conference on Social Computing.

[21]  Rediet Abebe Can Cascades be Predicted? , 2014 .

[22]  Yorick Wilks,et al.  Artificial Companions as a New Kind of Interface to the Future Internet , 2006 .

[23]  Bo Pang,et al.  The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter , 2014, ACL.

[24]  Noah A. Smith,et al.  Predicting Response to Political Blog Posts with Topic Models , 2009, NAACL.

[25]  Jure Leskovec,et al.  Natural Language Processing for Mental Health: Large Scale Discourse Analysis of Counseling Conversations , 2016, ArXiv.