An Indonesian Hoax News Detection System Using Reader Feedback and Naïve Bayes Algorithm

Abstract Hoax news in Indonesia spread at an alarming rate. To reduce this, hoax news detection system needs to be created and put into practice. Such a system may use readers’ feedback and Naïve Bayes algorithm, which is used to verify news. Overtime, by using readers’ feedback, database corpus will continue to grow and could improve system performance. The current research aims to reach this. System performance evaluation is carried out under two conditions ‒ with and without sources (URL). The system is able to detect hoax news very well under both conditions. The highest precision, recall and f-measure values when including URL are 0.91, 1, and 0.95 respectively. Meanwhile, the highest value of precision, recall and f-measure without URL are 0.88, 1 and 0.94, respectively.

[1]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[2]  Internet Hoaxes: How to Spot Them and How to Debunk Them , 2001 .

[3]  J.C. Hernandez,et al.  A first step towards automatic hoax detection , 2002, Proceedings. 36th Annual 2002 International Carnahan Conference on Security Technology.

[4]  F. Tala A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia , 2003 .

[5]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[6]  Marin Vukovic,et al.  An Intelligent Automatic Hoax Detection System , 2009, KES.

[7]  Kristina Chodorow,et al.  MongoDB - The Definitive Guide: Powerful and Scalable Data Storage , 2019 .

[8]  Edi Winarko,et al.  Analisis Fitur Kalimat untuk Peringkas Teks Otomatis pada Bahasa Indonesia , 2011 .

[9]  Suet-Peng Yong,et al.  Distance-based hoax detection system , 2012, 2012 International Conference on Computer & Information Science (ICCIS).

[10]  K. G. Srinivasa,et al.  Summarizing News Paper Articles: Experiments with Ontology- Based, Customized, Extractive Text Summary and Word Scoring , 2012 .

[11]  Liping Han,et al.  Distance Weighted Cosine Similarity Measure for Text Classification , 2013, IDEAL.

[12]  Yoke Yie Chen,et al.  Email Hoax Detection System Using Levenshtein Distance Method , 2014, J. Comput..

[13]  Juan A. Bonache-Seco,et al.  Remote Web-based Control Laboratory for Mobile Devices based on EJsS, Raspberry Pi and Node.js* , 2015 .

[14]  Khuat Thanh Tung,et al.  A Comparison of Algorithms used to measure the Similarity between two documents , 2015 .

[15]  Errissya Rasywir,et al.  Eksperimen pada Sistem Klasifikasi Berita Hoax Berbahasa Indonesia Berbasis Pembelajaran Mesin , 2016 .

[16]  Tiago A. Almeida,et al.  Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering , 2016, Knowl. Based Syst..

[17]  Sandro Pasquali,et al.  Mastering Node.js : build robust and scalable real-time server-side web applications efficiently , 2017 .

[18]  Mykhailo Granik,et al.  Fake news detection using naive Bayes classifier , 2017, 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON).

[19]  Indah Werdiningsih,et al.  Implementation of the common phrase index method on the phrase query for information retrieval , 2017 .

[20]  Mauridhi Hery Purnomo,et al.  Keynote Speaker II , 2017, Procedia Computer Science.

[21]  Fawaz S. Al-Anzi,et al.  Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing , 2017, J. King Saud Univ. Comput. Inf. Sci..

[22]  Endah Purwanti,et al.  Categorizing document by fuzzy C-Means and K-nearest neighbors approach , 2017 .

[23]  Zulfany Erlisa Rasjid,et al.  Performance Comparison and Optimization of Text Document Classification using k-NN and Naïve Bayes Classification Techniques , 2017, ICCSCI.

[24]  Rosa Andrie Asmara,et al.  Study of hoax news detection using naïve bayes classifier in Indonesian language , 2017, 2017 11th International Conference on Information & Communication Technology and System (ICTS).

[25]  Adamu I. Abubakar,et al.  Online fake news detection algorithm , 2017 .

[26]  Han-joon Kim,et al.  Towards perfect text classification with Wikipedia-based semantic Naïve Bayes learning , 2018, Neurocomputing.

[27]  Dharmaraj R. Patil,et al.  Malicious URLs Detection Using Decision Tree Classifiers and Majority Voting Technique , 2018 .