Characterizing Topics in Social Media Using Dynamics of Conversation

Online social media provides massive open-ended platforms for users of a wide variety of backgrounds, interests, and beliefs to interact and debate, facilitating countless discussions across a myriad of subjects. With numerous unique voices being lent to the ever-growing information stream, it is essential to consider how the types of conversations that result from a social media post represent the post itself. We hypothesize that the biases and predispositions of users cause them to react to different topics in different ways not necessarily entirely intended by the sender. In this paper, we introduce a set of unique features that capture patterns of discourse, allowing us to empirically explore the relationship between a topic and the conversations it induces. Utilizing “microscopic” trends to describe “macroscopic” phenomena, we set a paradigm for analyzing information dissemination through the user reactions that arise from a topic, eliminating the need to analyze the involved text of the discussions. Using a Reddit dataset, we find that our features not only enable classifiers to accurately distinguish between content genre, but also can identify more subtle semantic differences in content under a single topic as well as isolating outliers whose subject matter is substantially different from the norm.

[1]  Alistair Moffat,et al.  A similarity measure for indefinite rankings , 2010, TOIS.

[2]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[3]  Sibel Adali,et al.  Identifying the Social Signals That Drive Online Discussions: A Case Study of Reddit Communities , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[4]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[5]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[6]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[7]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Jack Hessel,et al.  Science, AskScience, and BadScience: On the Coexistence of Highly Related Communities , 2016, ICWSM.

[10]  Ambuj K. Singh,et al.  The social media genome: Modeling individual topic-specific behavior in social media , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[11]  Amy X. Zhang,et al.  Characterizing Online Discussion Using Coarse Discourse Sequences , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[12]  Boleslaw K. Szymanski,et al.  Entropy Measures of Human Communication Dynamics , 2018, Scientific Reports.

[13]  Katherine L. Milkman,et al.  Emotion and Virality: What Makes Online Content Go Viral? , 2013 .

[14]  Jürgen Pfeffer,et al.  Characterizing the life cycle of online news stories using social media reactions , 2013, CSCW.

[15]  Boleslaw K. Szymanski,et al.  A Reaction-Based Approach to Information Cascade Analysis , 2019, 2019 28th International Conference on Computer Communication and Networks (ICCCN).

[16]  Florian Probst,et al.  Identifying Key Users in Online Social Networks: A PageRank Based Approach , 2010, ICIS.

[17]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[18]  Haewoon Kwak,et al.  Finding influentials based on the temporal order of information adoption in twitter , 2010, WWW '10.