Rumor has it: Identifying Misinformation in Microblogs

A rumor is commonly defined as a statement whose true value is unverifiable. Rumors may spread misinformation (false information) or disinformation (deliberately false information) on a network of people. Identifying rumors is crucial in online social media where large amounts of information are easily spread across a large network by sources with unverified authority. In this paper, we address the problem of rumor detection in microblogs and explore the effectiveness of 3 categories of features: content-based, network-based, and microblog-specific memes for correctly identifying rumors. Moreover, we show how these features are also effective in identifying disinformers, users who endorse a rumor and further help it to spread. We perform our experiments on more than 10,000 manually annotated tweets collected from Twitter and show how our retrieval model achieves more than 0.95 in Mean Average Precision (MAP). Finally, we believe that our dataset is the first large-scale dataset on rumor detection. It can open new dimensions in analyzing online misinformation and other aspects of microblog conversations.

[1]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[2]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[3]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[4]  Efthimis N. Efthimiadis,et al.  Conversational tagging in twitter , 2010, HT '10.

[5]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.

[6]  C. T. Salmon Manufacturing Consent: The Political Economy of the Mass Media , 1989 .

[7]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[8]  P. Bordia,et al.  Rumor, Gossip and Urban Legends , 2007 .

[9]  R. H. Knapp,et al.  A PSYCHOLOGY OF RUMOR , 1944 .

[10]  Floyd H. Allport,et al.  Wartime rumors of waste and special privilege: why some people believe them. , 1945 .

[11]  Jacob Ratkiewicz,et al.  Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams , 2010, ArXiv.

[12]  Dragomir R. Radev,et al.  What’s with the Attitude? Identifying Sentences with Attitude in Online Discussions , 2010, EMNLP.

[13]  Jianfeng Gao,et al.  A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing , 2007, ACL.

[14]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[15]  R. L. Rosnow,et al.  Reining in rumors , 1994 .

[16]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[17]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[18]  Barbara Poblete,et al.  Twitter under crisis: can we trust what we RT? , 2010, SOMA '10.

[19]  Susan C. Herring,et al.  Beyond Microblogging: Conversation and Collaboration via Twitter , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[20]  Barbara Rosario,et al.  What is disputed on the web? , 2010, WICOW '10.

[21]  K. Krippendorff Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology . Beverly Hills, CA: Sage, 1980. , 1980 .

[22]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[23]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[24]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[25]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[26]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.