PUED: A Social Spammer Detection Method Based on PU Learning and Ensemble Learning

In social network, people generally tend to share information with others, thus, those who have frequent access to the social network are more likely to be affected by the interest and opinions of other people. This characteristic is exploited by spammers, who spread spam information in network to disturb normal users for interest motives seriously. Numerous notable studies have been done to detect social spammers, and these methods can be categorized into three types: unsupervised, supervised and semi-supervised methods. While the performance of supervised and semi-supervised methods is superior in terms of detection accuracy, these methods usually suffer from the dilemma of imbalanced data since the number of unlabeled normal users is far more than spammers’ in real situations. To address the problem, we propose a novel method only relying on normal users to detect spammers exactly. We present two steps: one picks out reliable spammers from unlabeled samples which is imposed on a voting classifier; while the other trains a random forest detector from the normal users and reliable spammers. We conduct experiments on two real-world social datasets and show that our method outperforms other supervised methods.

[1]  Yiqi Chen,et al.  Social Spammer Detection via Structural Properties in Ego Network , 2016, SMP.

[2]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[3]  Huan Liu,et al.  Adaptive Spammer Detection with Sparse Group Modeling , 2017, ICWSM.

[4]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[5]  Arjun Mukherjee,et al.  Exploiting Burstiness in Reviews for Review Spammer Detection , 2021, ICWSM.

[6]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[7]  Xin Yao,et al.  Online Ensemble Learning of Data Streams with Gradually Evolved Classes , 2016, IEEE Transactions on Knowledge and Data Engineering.

[8]  Songqing Chen,et al.  UNIK: unsupervised social network spam detection , 2013, CIKM.

[9]  Peter Bühlmann,et al.  Bagging, Boosting and Ensemble Methods , 2012 .

[10]  Junhao Wen,et al.  LSSL-SSD: Social Spammer Detection with Laplacian Score and Semi-supervised Learning , 2016, KSEM.

[11]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, CCS '10.

[12]  Huan Liu,et al.  Exploring characteristics of suspended users and network stability on Twitter , 2016, Social Network Analysis and Mining.

[13]  Zengyou He,et al.  A Semi-Supervised Framework for Social Spammer Detection , 2015, PAKDD.

[14]  Junjie Wu,et al.  Spammers Detection from Product Reviews: A Hybrid Model , 2015, 2015 IEEE International Conference on Data Mining.

[15]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[16]  Virgílio A. F. Almeida,et al.  Detecting Spammers and Content Promoters in Online Video Social Networks , 2009, IEEE INFOCOM Workshops 2009.