Detection Of Spam Comments On Instagram Using Complementary Naïve Bayes

Instagram (IG) is a web-based and mobile social media application where users can share photos or videos with available features. Upload photos or videos with captions that contain an explanation of the photo or video that can reap spam comments. Comments on spam containing comments that are not relevant to the caption and photos. The problem that arises when identifying spam is non-spam comments are more dominant than spam comments so that it leads to the problem of the imbalanced dataset. A balanced dataset can influence the performance of a classification method. This is the focus of research related to the implementation of the CNB method in dealing with imbalance datasets for the detection of Instagram spam comments. The study used TF-IDF weighting with Support Vector Machine (SVM) as a comparison classification. Based on the test results with 2500 training data and 100 test data on the imbalanced dataset (25% spam and 75% non-spam), the CNB accuracy was 92%, precision 86% and f-measure 93%. Whereas SVM produces 87% accuracy, 79% precision, 88% f-measure. In conclusion, the CNB method is more suitable for detecting spam comments in cases of imbalanced datasets.

[1]  Ali Akbar Septiandri,et al.  Detecting spam comments on Indonesia’s Instagram posts , 2017 .

[2]  Hung-Min Sun,et al.  Instagram Spam Detection , 2017, 2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC).

[3]  Shalini Batra,et al.  Ensemble based spam detection in social IoT using probabilistic data structures , 2018, Future Gener. Comput. Syst..

[4]  Xingquan Zhu,et al.  iSRD: Spam review detection with imbalanced data distributions , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).

[5]  Yuan Lukito,et al.  Deteksi Komentar Spam Bahasa Indonesia Pada Instagram Menggunakan Naive Bayes , 2017 .

[6]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[7]  Zheyi Chen,et al.  Detecting spammers on social networks , 2015, Neurocomputing.

[8]  Huaikou Miao,et al.  Classification of wine quality with imbalanced data , 2016, 2016 IEEE International Conference on Industrial Technology (ICIT).

[9]  Ali Kia,et al.  Classification of Earthquake-Induced Damage for R/C Slab Column Frames Using Multiclass SVM and Its Combination with MLP Neural Network , 2014 .