Identifying Fake News in Indonesian via Supervised Binary Text Classification

Fake news detection has gained growing interest from both the industry and research community all around the world, including Indonesia. Based on recent surveys, people could receive fake news daily, if not more than once. The research community and practitioners, supported by the government, are trying to fight back the spreading of fake news. This paper aims to implement a supervised machine learning approach using the Multi-Layer Perceptron (MLP) for classifying news article in order to detect fake news articles and differentiate them from the valid ones, via a binary text classification approach. Furthermore, this paper uses TF-IDF in comparison with the Bag of Words model to extract features along with the use of the n-gram model. Based on the result, our final model could achieve a hoax precision and recall score of 0.84 and 0.73, respectively, and a macro-averaged F1-score of 0.82. Furthermore, our paper shows that some preprocessing methods such as stemming and stop-word removal could be very time-consuming while only barely affecting the performance of our classifier model using the dataset in this research for identifying fake news.

[1]  Rosa Andrie Asmara,et al.  Study of hoax news detection using naïve bayes classifier in Indonesian language , 2017, 2017 11th International Conference on Information & Communication Technology and System (ICTS).

[2]  Tansa Ta Putri,et al.  ANALYSIS AND DETECTION OF HOAX CONTENTS IN INDONESIAN NEWS BASED ON MACHINE LEARNING , 2019 .

[3]  Verónica Pérez-Rosas,et al.  Automatic Detection of Fake News , 2017, COLING.

[4]  Huan Liu,et al.  Understanding User Profiles on Social Media for Fake News Detection , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[5]  Aghus Sofwan,et al.  Hoax detection system on Indonesian news sites based on text classification using SVM and SGD , 2017, 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE).

[6]  Sebastian Tschiatschek,et al.  Fake News Detection in Social Networks via Crowd Signals , 2017, WWW.

[7]  Luciana Oliveira,et al.  The current state of fake news , 2017 .

[8]  Mumtaz Ahmed,et al.  A Hybrid Approach for Fake News Detection using Convolution and Multilayer Perceptron , 2019 .

[9]  Graeme Hirst,et al.  Detecting Deceptive Opinions with Profile Compatibility , 2013, IJCNLP.

[10]  Rudy,et al.  News Article Text Classification in Indonesian Language , 2017, ICCSCI.

[11]  Andre Rusli,et al.  Using Naïve Bayes Classifier for Application Feedback Classification and Management in Bahasa Indonesia , 2019, 2019 5th International Conference on New Media Studies (CONMEDIA).

[12]  Andre Rusli,et al.  Sentiment Analysis of Application User Feedback in Bahasa Indonesia Using Multinomial Naive Bayes , 2019, 2019 5th International Conference on New Media Studies (CONMEDIA).

[13]  Jeffrey T. Hancock,et al.  Linguistic Traces of a Scientific Fraud: The Case of Diederik Stapel , 2014, PloS one.

[14]  Bilal Alatas,et al.  Fake news detection within online social media using supervised artificial intelligence algorithms , 2020 .