Improving Fake News Detection Using K-means and Support Vector Machine Approaches

 Abstract — Fake news and false information are big challenges of all types of media, especially social media. There is a lot of false information, fake likes, views and duplicated accounts as big social networks such as Facebook and Twitter admitted. Most information appearing on social media is doubtful and in some cases misleading. They need to be detected as soon as possible to avoid a negative impact on society. The dimensions of the fake news datasets are growing rapidly, so to obtain a better result of detecting false information with less computation time and complexity, the dimensions need to be reduced. One of the best techniques of reducing data size is using feature selection method. The aim of this technique is to choose a feature subset from the original set to improve the classification performance. In this paper, a feature selection method is proposed with the integration of K-means clustering and Support Vector Machine (SVM) approaches which work in four steps. First, the similarities between all features are calculated. Then, features are divided into several clusters. Next, the final feature set is selected from all clusters, and finally, fake news is classified based on the final feature subset using the SVM method. The proposed method was evaluated by comparing its performance with other state-of-the-art methods on several specific benchmark datasets and the outcome showed a better classification of false information for our work. The detection performance was improved in two aspects. On the one hand, the detection runtime process decreased, and on the other hand, the classification accuracy increased because of the elimination of redundant features and the reduction of datasets dimensions.

[1]  Ali A. Ghorbani,et al.  An overview of online fake news: Characterization, detection, and discussion , 2020, Inf. Process. Manag..

[2]  Ashish Gupta,et al.  Detecting fake news for reducing misinformation risks using analytics approaches , 2019, Eur. J. Oper. Res..

[3]  Francesco Marcelloni,et al.  A survey on fake news and rumour detection techniques , 2019, Inf. Sci..

[4]  Athena Vakali,et al.  Behind the cues: A benchmarking study for fake news detection , 2019, Expert Syst. Appl..

[5]  Seyed Taghi Akhavan Niaki,et al.  A bi-objective hybrid optimization algorithm to reduce noise and data dimension in diabetes diagnosis using support vector machines , 2019, Expert Syst. Appl..

[6]  In Seop Na,et al.  Human-machine interaction: A case study on fake news detection using a backtracking based on a cognitive system , 2019, Cognitive Systems Research.

[7]  Arezoo Zakeri,et al.  Efficient feature selection method using real-valued grasshopper optimization algorithm , 2019, Expert Syst. Appl..

[8]  Fabrício Benevenuto,et al.  Supervised Learning for Fake News Detection , 2019, IEEE Intelligent Systems.

[9]  David Pogue,et al.  How to Stamp Out Fake News. , 2017, Scientific American.

[10]  Yimin Chen,et al.  Automatic deception detection: Methods for finding fake news , 2015, ASIST.

[11]  Yimin Chen,et al.  News in an online world: The need for an “automatic crap detector” , 2015, ASIST.

[12]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[13]  Nasser Ghasem-Aghaee,et al.  Text feature selection using ant colony optimization , 2009, Expert Syst. Appl..

[14]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[15]  J. K. Bertrand,et al.  The ant colony algorithm for feature selection in high-dimension gene expression data for disease classification. , 2007, Mathematical medicine and biology : a journal of the IMA.

[16]  Monther Aldwairi,et al.  Detecting Fake News in Social Media Networks , 2018, EUSPN/ICTH.

[17]  Victoria L. Rubin,et al.  Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading News , 2016 .