Detecting Spammer Groups From Product Reviews: A Partially Supervised Learning Model

Nowadays, online product reviews play a crucial role in the purchase decision of consumers. A high proportion of positive reviews will bring substantial sales growth, while negative reviews will cause sales loss. Driven by the immense financial profits, many spammers try to promote their products or demote their competitors’ products by posting fake and biased online reviews. By registering a number of accounts or releasing tasks in crowdsourcing platforms, many individual spammers could be organized as spammer groups to manipulate the product reviews together and can be more damaging. Existing works on spammer group detection extract spammer group candidates from review data and identify the real spammer groups using unsupervised spamicity ranking methods. Actually, according to the previous research, labeling a small number of spammer groups is easier than one assumes, however, few methods try to make good use of these important labeled data. In this paper, we propose a partially supervised learning model (PSGD) to detect spammer groups. By labeling some spammer groups as positive instances, PSGD applies positive unlabeled learning (PU-Learning) to study a classifier as spammer group detector from positive instances (labeled spammer groups) and unlabeled instances (unlabeled groups). Specifically, we extract reliable negative set in terms of the positive instances and the distinctive features. By combining the positive instances, extracted negative instances and unlabeled instances, we convert the PU-Learning problem into the well-known semi-supervised learning problem, and then use a Naive Bayesian model and an EM algorithm to train a classifier for spammer group detection. Experiments on real-life Amazon.cn data set show that the proposed PSGD is effective and outperforms the state-of-the-art spammer group detection methods.

[1]  Dong-Hong Ji,et al.  Positive Unlabeled Learning for Deceptive Reviews Detection , 2014, EMNLP.

[2]  Claire Cardie,et al.  Estimating the prevalence of deception in online review communities , 2012, WWW.

[3]  Philip S. Yu,et al.  Text classification without labeled negative documents , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Naomie Salim,et al.  Detection of review spam: A survey , 2015, Expert Syst. Appl..

[5]  Ee-Peng Lim,et al.  Finding unusual review patterns using unexpected rules , 2010, CIKM.

[6]  X. Zhang,et al.  Impact of Online Consumer Reviews on Sales: The Moderating Role of Product and Consumer Characteristics , 2010 .

[7]  Arjun Mukherjee,et al.  Spotting Fake Reviews using Positive-Unlabeled Learning , 2014, Computación y Sistemas.

[8]  Xing Xie,et al.  T-drive: driving directions based on taxi trajectories , 2010, GIS '10.

[9]  Masrah Azrifah Azmi Murad,et al.  Detecting deceptive reviews using lexical and syntactic features , 2013, 2013 13th International Conference on Intellient Systems Design and Applications.

[10]  Paolo Rosso,et al.  Using PU-Learning to Detect Deceptive Opinion Spam , 2013, WASSA@NAACL-HLT.

[11]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[12]  Chong Long,et al.  Uncovering collusive spammers in Chinese review websites , 2013, CIKM.

[13]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[14]  Junjie Wu,et al.  Spammers Detection from Product Reviews: A Hybrid Model , 2015, 2015 IEEE International Conference on Data Mining.

[15]  Jie Cao,et al.  Semi-SGD: Semi-Supervised Learning Based Spammer Group Detection in Product Reviews , 2017, 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD).

[16]  Wee Sun Lee,et al.  Partially Supervised Learning , 2011 .

[17]  Anna Cinzia Squicciarini,et al.  Uncovering Crowdsourced Manipulation of Online Reviews , 2015, SIGIR.

[18]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[19]  Abhinav Kumar,et al.  Spotting opinion spammers using behavioral footprints , 2013, KDD.

[20]  Arjun Mukherjee,et al.  Exploiting Burstiness in Reviews for Review Spammer Detection , 2021, ICWSM.

[21]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[22]  Reza Farahbakhsh,et al.  NetSpam: A Network-Based Spam Detection Framework for Reviews in Online Social Media , 2017, IEEE Transactions on Information Forensics and Security.

[23]  Weixiang Shao,et al.  Bimodal Distribution and Co-Bursting in Review Spam Detection , 2017, WWW.

[24]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[25]  Bing Liu,et al.  Spotting Fake Reviews via Collective Positive-Unlabeled Learning , 2014, 2014 IEEE International Conference on Data Mining.

[26]  Naomie Salim,et al.  Detection of fake opinions using time series , 2016, Expert Syst. Appl..

[27]  Philip S. Yu,et al.  Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[28]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[29]  Raymond Y. K. Lau,et al.  Toward a Language Modeling Approach for Consumer Review Spam Detection , 2010, 2010 IEEE 7th International Conference on E-Business Engineering.

[30]  Zhan Bu,et al.  Discovering shilling groups in a real e-commerce platform , 2016, Online Inf. Rev..

[31]  Zhuo Wang,et al.  Detecting Review Spammer Groups via Bipartite Graph Projection , 2016, Comput. J..

[32]  Leman Akoglu,et al.  Collective Opinion Spam Detection: Bridging Review Networks and Metadata , 2015, KDD.

[33]  Arjun Mukherjee,et al.  On the Temporal Dynamics of Opinion Spamming: Case Studies on Yelp , 2016, WWW.

[34]  Philip S. Yu,et al.  Positive Unlabeled Learning for Data Stream Classification , 2009, SDM.

[35]  Yi Yang,et al.  Learning to Identify Review Spam , 2011, IJCAI.

[36]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.