Semi-supervised clue fusion for spammer detection in Sina Weibo

Abstract Microblog is a popular social network platform that facilitates users to collect and spread information on the Internet, but on the other side it stimulates new forms of spammers, who can hinder effective information dissemination. Spammers in Sina Weibo use various spamming strategies to evade protection mechanisms, which presents practical challenges in spammer detection. First, clues to identify spammers are usually hidden in multiple aspects, such as content, behavior, relationship, and interaction. Second, labeled training instances are lacking for learning. In this paper, a novel approach called Semi-Supervised Clue Fusion (SSCF) is proposed to conduct effective spammer detection in Sina Weibo. SSCF acquires a linear weighted function to fuse the comprehensive clues explored from multiple aspects to obtain final results. SSCF iteratively predicts the unlabeled instances based on a small size of primarily labeled instances in a semi-supervised fashion. SSCF is empirically validated on the real-world data from Sina Weibo. Results show that this approach significantly outperforms state-of-the-art baselines.

[1]  Yi Yang,et al.  Learning to Identify Review Spam , 2011, IJCAI.

[2]  Lauro Snidaro,et al.  Fusion of heterogeneous features via cascaded on-line boosting , 2008, 2008 11th International Conference on Information Fusion.

[3]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[4]  Yu Zheng,et al.  U-Air: when urban air quality inference meets big data , 2013, KDD.

[5]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[6]  Zhi-Hua Zhou,et al.  CoTrade: Confident Co-Training With Data Editing. , 2011, IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society.

[7]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[8]  Yang Zhang,et al.  Modeling user posting behavior on social media , 2012, SIGIR '12.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[11]  Ming-Wei Chang,et al.  Partitioned logistic regression for spam filtering , 2008, KDD.

[12]  Leman Akoglu,et al.  Collective Opinion Spam Detection: Bridging Review Networks and Metadata , 2015, KDD.

[13]  Qiang Yang,et al.  Discovering Spammers in Social Networks , 2012, AAAI.

[14]  Abhinav Kumar,et al.  Spotting opinion spammers using behavioral footprints , 2013, KDD.

[15]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[16]  Xiaokang Yang,et al.  Analysis and identification of spamming behaviors in Sina Weibo microblog , 2013, SNAKDD '13.

[17]  Adi Wijaya,et al.  Hybrid decision tree and logistic regression classifier for email spam detection , 2016, 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE).

[18]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[19]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[20]  Cornelia Caragea,et al.  Improving Researcher Homepage Classification with Unlabeled Data , 2015, TWEB.

[21]  Fenglong Ma,et al.  Discovering social spammers from multiple views , 2017, Neurocomputing.

[22]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[23]  Fangzhao Wu,et al.  Social Spammer and Spam Message Co-Detection in Microblogging with Social Context Regularization , 2015, CIKM.

[24]  Irene Y. H. Gu,et al.  Multi-view face pose classification by boosting with weak hypothesis fusion using visual and infrared images , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Naomie Salim,et al.  Detection of fake opinions using time series , 2016, Expert Syst. Appl..

[26]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[27]  Philip S. Yu,et al.  Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[28]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[29]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[30]  Pong C. Yuen,et al.  A Boosted Co-Training Algorithm for Human Action Recognition , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.