论文信息 - Spammers Detection from Product Reviews: A Hybrid Model

Spammers Detection from Product Reviews: A Hybrid Model

Driven by profits, spam reviews for product promotion or suppression become increasingly rampant in online shopping platforms. This paper focuses on detecting hidden spam users based on product reviews. In the literature, there have been tremendous studies suggesting diversified methods for spammer detection, but whether these methods can be combined effectively for higher performance remains unclear. Along this line, a hybrid PU-learning-based Spammer Detection (hPSD) model is proposed in this paper. On one hand, hPSD can detect multi-type spammers by injecting or recognizing only a small portion of positive samples, which meets particularly real-world application scenarios. More importantly, hPSD can leverage both user features and user relations to build a spammer classifier via a semi-supervised hybrid learning framework. Experimental results on movie data sets with shilling injection show that hPSD outperforms several state-of-the-art baseline methods. In particular, hPSD shows great potential in detecting hidden spammers as well as their underlying employers from a real-life Amazon data set. These demonstrate the effectiveness and practical value of hPSD for real-life applications.

[1] Bing Liu,et al. Review spam detection , 2007, WWW '07.

[2] Ee-Peng Lim,et al. Detecting product review spammers using rating behaviors , 2010, CIKM.

[3] John Riedl,et al. Shilling recommender systems for fun and profit , 2004, WWW '04.

[4] Bamshad Mobasher,et al. Profile Injection Attack Detection for Securing Collaborative Recommender Systems 1 , 2006 .

[5] Luca Becchetti,et al. Using rank propagation and Probabilistic counting for Link-Based Spam Detection , 2006 .

[6] Kyumin Lee,et al. Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[7] Virgílio A. F. Almeida,et al. Detecting Spammers and Content Promoters in Online Video Social Networks , 2009, IEEE INFOCOM Workshops 2009.

[8] Naomie Salim,et al. Detection of review spam: A survey , 2015, Expert Syst. Appl..

[9] Bing Liu,et al. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[10] Philip S. Yu,et al. Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[11] 范伟,et al. Detecting Marionette Microblog Users for Improved Information Credibility , 2015 .

[12] Junjie Wu,et al. HySAD: a semi-supervised hybrid shilling attack detector for trustworthy product recommendation , 2012, KDD.

[13] Mark W. Schmidt,et al. Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[14] Philip S. Yu,et al. Review spam detection via temporal pattern discovery , 2012, KDD.

[15] Arjun Mukherjee,et al. Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[16] Bhaskar Mehta,et al. Unsupervised strategies for shilling detection and robust collaborative filtering , 2009, User Modeling and User-Adapted Interaction.

[17] Philip S. Yu,et al. Text classification without negative examples revisit , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18] Derek Greene,et al. Distortion as a validation criterion in the identification of suspicious reviews , 2010, SOMA '10.

[19] Thomas Hofmann,et al. Lies and propaganda: detecting spam users in collaborative filtering , 2007, IUI '07.

[20] Bing Liu,et al. Opinion spam and analysis , 2008, WSDM '08.

[21] Bing Liu,et al. Analyzing and Detecting Review Spam , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[22] Junjie Wu,et al. How Many Zombies Around You? , 2013, 2013 IEEE 13th International Conference on Data Mining.

[23] Abhinav Kumar,et al. Spotting opinion spammers using behavioral footprints , 2013, KDD.

[24] Jong-Seok Lee,et al. Shilling Attack Detection - A New Approach for a Trustworthy Recommender System , 2012, INFORMS J. Comput..

[25] Xing Xie,et al. T-drive: driving directions based on taxi trajectories , 2010, GIS '10.