Evaluating machine learning algorithms for fake news detection

This paper explores the application of natural language processing techniques for the detection of ‘fake news’, that is, misleading news stories that come from non-reputable sources. Using a dataset obtained from Signal Media and a list of sources from OpenSources.co, we apply term frequency-inverse document frequency (TF-IDF) of bi-grams and probabilistic context free grammar (PCFG) detection to a corpus of about 11,000 articles. We test our dataset on multiple classification algorithms — Support Vector Machines, Stochastic Gradient Descent, Gradient Boosting, Bounded Decision Trees, and Random Forests. We find that TF-IDF of bi-grams fed into a Stochastic Gradient Descent model identifies non-credible sources with an accuracy of 77.2%, with PCFGs having slight effects on recall.

[1]  Joel R. Tetreault,et al.  It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool , 2015, ACL.

[2]  Mark Johnson,et al.  An Improved Non-monotonic Transition System for Dependency Parsing , 2015, EMNLP.

[3]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[4]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[5]  Graeme Hirst,et al.  Detecting Deceptive Opinions with Profile Compatibility , 2013, IJCNLP.

[6]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[7]  M-Dyaa Albakour,et al.  What do a Million News Articles Look like? , 2016, NewsIR@ECIR.

[8]  Yejin Choi,et al.  Syntactic Stylometry for Deception Detection , 2012, ACL.

[9]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[10]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[11]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[12]  Stefan Behnel,et al.  Cython: The Best of Both Worlds , 2011, Computing in Science & Engineering.

[13]  Victoria L. Rubin,et al.  Truth and deception at the rhetorical structure level , 2015, J. Assoc. Inf. Sci. Technol..

[14]  Gilles Louppe,et al.  Independent consultant , 2013 .

[15]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.