Deceptive review detection using labeled and unlabeled data

Availability of millions of products and services on e-commerce sites makes it difficult to search the best suitable product according to the requirements because of existence of many alternatives. To get rid of this the most popular and useful approach is to follow reviews of others in opinionated social medias, who have already tried them. Almost all e-commerce sites provide facility to the users for giving views and experience of the product and services they experienced. The customers reviews are increasingly used by individuals, manufacturers and retailers for purchase and business decisions. As there is no scrutiny over the reviews received, anybody can write anything unanimously which conclusively leads to review spam. Moreover, driven by the desire of profit and/or publicity, spammers produce synthesized reviews to promote some products/brand and demote competitors products/brand. Deceptive review spam has seen a considerable growth overtime. In this work, we have applied supervised as well as unsupervised techniques to identify review spam. Most effective feature sets have been assembled for model building. Sentiment analysis has also been incorporated in the detection process. In order to get best performance some well-known classifiers were applied on labeled dataset. Further, for the unlabeled data, clustering is used after desired attributes were computed for spam detection. Additionally, there is a high chance that spam reviewers may also be held responsible for content pollution in multimedia social networks, because nowadays many users are giving the reviews using their social network logins. Finally, the work can be extended to find suspicious accounts responsible for posting fake multimedia contents into respective social networks.

[1]  Taghi M. Khoshgoftaar,et al.  Survey of review spam detection using machine learning techniques , 2015, Journal of Big Data.

[2]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[3]  S. Shivashankar,et al.  Conceptual level similarity measure based review spam detection , 2010, 2010 International Conference on Signal and Image Processing.

[4]  Abhinav Kumar,et al.  Spotting opinion spammers using behavioral footprints , 2013, KDD.

[5]  Geoffrey Leech,et al.  Grammatical word class variation within the British National Corpus sampler , 2002 .

[6]  Arjun Mukherjee,et al.  Exploiting Burstiness in Reviews for Review Spammer Detection , 2021, ICWSM.

[7]  Raymond Y. K. Lau,et al.  Text mining and probabilistic language modeling for online review spam detecting , 2011 .

[8]  Hongxun Yao,et al.  Multi-modal microblog classification via multi-task learning , 2014, Multimedia Tools and Applications.

[9]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[10]  Vuong M. Ngo,et al.  Opinion Spam Recognition Method for Online Reviews using Ontological Features , 2018, ArXiv.

[11]  Masrah Azrifah Azmi Murad,et al.  Detecting deceptive reviews using lexical and syntactic features , 2013, 2013 13th International Conference on Intellient Systems Design and Applications.

[12]  Yue Gao,et al.  Brand Data Gathering From Live Social Media Streams , 2014, ICMR.

[13]  Christopher G. Harris Detecting Deceptive Opinion Spam Using Human Computation , 2012, HCOMP@AAAI.

[14]  Kanliang Wang,et al.  A trust model for multimedia social networks , 2012, Social Network Analysis and Mining.

[15]  Derek Greene,et al.  Distortion as a validation criterion in the identification of suspicious reviews , 2010, SOMA '10.

[16]  Dong-Hong Ji,et al.  Positive Unlabeled Learning for Deceptive Reviews Detection , 2014, EMNLP.

[17]  Hainan Zhao,et al.  Live multimedia brand-related data identification in microblog , 2015, Neurocomputing.

[18]  Arjun Mukherjee,et al.  Fake Review Detection: Classification and Analysis of Real and Pseudo Reviews , 2013 .

[19]  Paolo Rosso,et al.  Using PU-Learning to Detect Deceptive Opinion Spam , 2013, WASSA@NAACL-HLT.

[20]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[21]  Junhui Wang,et al.  Detecting group review spam , 2011, WWW.

[22]  Qingxi Peng,et al.  Detecting Spam Review through Sentiment Analysis , 2014, J. Softw..

[23]  Xiangyu Wang,et al.  Logo information recognition in large-scale social media data , 2014, Multimedia Systems.

[24]  Bing Liu,et al.  Analyzing and Detecting Review Spam , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[25]  Raymond Y. K. Lau,et al.  Toward a Language Modeling Approach for Consumer Review Spam Detection , 2010, 2010 IEEE 7th International Conference on E-Business Engineering.

[26]  Claire Cardie,et al.  Negative Deceptive Opinion Spam , 2013, NAACL.

[27]  Bing Liu,et al.  Review spam detection , 2007, WWW '07.

[28]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[29]  Calton Pu,et al.  Social spam, campaigns, misinformation and crowdturfing , 2014, WWW '14 Companion.

[30]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[31]  Yue Gao,et al.  Multimedia Social Event Detection in Microblog , 2015, MMM.

[32]  Arjun Mukherjee,et al.  What Yelp Fake Review Filter Might Be Doing? , 2013, ICWSM.

[33]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[34]  Raymond Y. K. Lau,et al.  Text mining and probabilistic language modeling for online review spam detection , 2012, TMIS.

[35]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[36]  Yi Yang,et al.  Learning to Identify Review Spam , 2011, IJCAI.

[37]  Christos Faloutsos,et al.  Detecting anomalies in dynamic rating data: a robust probabilistic model for rating evolution , 2014, KDD.