BuzzFace: A News Veracity Dataset with Facebook User Commentary and Egos

Veracity assessment of news and social bot detection have become two of the most pressing issues for social media platforms, yet current gold-standard data are limited. This paper presents a leap forward in the development of a sizeable and feature rich gold-standard dataset. The dataset was built by using a collection of news items posted to Facebook by nine news outlets during September 2016, which were annotated for veracity by BuzzFeed. These articles were refined beyond binary annotation to the four categories: mostly true, mostly false, mixture of true and false, and no factual content. Our contribution integrates data on Facebook comments and reactions publicly available on the platform’s Graph API, and provides tailored tools for accessing news article web content. The features of the accessed articles include body text, images, links, Facebook plugin comments, Disqus plugin comments, and embedded tweets. Embedded tweets provide a potent possible avenue for expansion across social media platforms. Upon development, this utility yielded over 1.6 million text items, making it over 400 times larger than the current gold-standard. The resulting dataset—BuzzFace—is presently the most extensive created, and allows for more robust machine learning applications to news veracity assessment and social bot detection than ever before.

[1]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[2]  Yimin Chen,et al.  News in an online world: The need for an “automatic crap detector” , 2015, ASIST.

[3]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[4]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[5]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[6]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[7]  Ullrich K. H. Ecker,et al.  The effects of subtle misinformation in news headlines. , 2014, Journal of experimental psychology. Applied.

[8]  Philip N. Howard,et al.  Bots, #StrongerIn, and #Brexit: Computational Propaganda during the UK-EU Referendum , 2016, ArXiv.

[9]  W. Nuland,et al.  Information operations and Facebook , 2017 .

[10]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[11]  Bernhard Rieder,et al.  Studying Facebook via data extraction: the Netvizz application , 2013, WebSci.

[12]  Andreas Vlachos,et al.  Fact Checking: Task definition and dataset construction , 2014, LTCSS@ACL.

[13]  Yimin Chen,et al.  Deception detection for news: Three types of fakes , 2015, ASIST.

[14]  Jeffrey A. Gottfried,et al.  News use across social media platforms 2016 , 2016 .