Deteksi Bot Spammer Pada Twitter Menggunakan Smith Waterman Similarity Dan Time Interval Entropy

Twitter is a social media that interacts through 140-character text-based tweet posts including photos, videos and hyperlinks. Spam tweets contain harmful messages sent continuously. Besides disturbing it is also dangerous for the recipient, exacerbated by the use of bots that automatically and quickly spread spam messages that can cause data damage. This study aims to detect spam bots by utilizing the similarity of tweets using Smith Waterman and the posting time interval. Data tweets are collected using scrap libraries in python in the form of id, text, time, link, based on datasets labeled as available. The data is carried out by text preprocessing steps to clean the text and then do the calculations. The calculation results of both the similarity method and the post time interval are then classified with k-Neaset Neighbor with the previous dataset that has been labeled to get the spam or legitimate bot prediction results. The results of classification experiments with several combinations of k to detect spam bots with similarity criteria and entropy interval obtained the best results k = 3 Neirest Neighbor and 10 fold Cross Validation with a predictive value of detection accuracy of 80%, 84% precission and 84% recall.

[1]  Sushil Jajodia,et al.  Who is tweeting on Twitter: human, bot, or cyborg? , 2010, ACSAC '10.

[2]  Rizal Setya Perdana,et al.  BOT SPAMMER DETECTION IN TWITTER USING TWEET SIMILARITY AND TIME INTERVAL ENTROPY , 2015 .

[3]  Radiant Victor Imbar,et al.  Implementasi Cosine Similarity dan Algoritma Smith-Watermanuntuk Mendeteksi Kemiripan Teks , 2014 .

[4]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[5]  Florence Sèdes,et al.  Leveraging time for spammers detection on Twitter , 2016, MEDES.

[6]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[7]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, IMC '10.

[8]  Huan Liu,et al.  A new approach to bot detection: Striking the balance between precision and recall , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[9]  Diana Purwitasari,et al.  Deteksi Bot Spammer pada Twitter Berbasis Sentiment Analysis dan Time Interval Entropy , 2016 .

[10]  Abdul Munif,et al.  Rancang Bangun Sistem E-learning Pemrograman Pada Modul Deteksi Plagiarisme Kode Program Dan Student Feedback System , 2017 .

[11]  D. Taussky,et al.  Twitter , 2020, American journal of clinical oncology.

[12]  Edi Winarko,et al.  Rating Of Indonesian sinetron based on public opinion in Twitter using Cosine similarity , 2016, 2016 2nd International Conference on Science and Technology-Computer (ICST).