Identifying Manipulated Offerings on Review Portals

Recent work has developed supervised methods for detecting deceptive opinion spam— fake reviews written to sound authentic and deliberately mislead readers. And whereas past work has focused on identifying individual fake reviews, this paper aims to identify offerings (e.g., hotels) that contain fake reviews. We introduce a semi-supervised manifold ranking algorithm for this task, which relies on a small set of labeled individual reviews for training. Then, in the absence of gold standard labels (at an offering level), we introduce a novel evaluation procedure that ranks artificial instances of real offerings, where each artificial offering contains a known number of injected deceptive reviews. Experiments on a novel dataset of hotel reviews show that the proposed method outperforms state-of-art learning baselines.

[1]  Juan Martínez-Romo,et al.  Web spam identification through language model analysis , 2009, AIRWeb '09.

[2]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[3]  Ee-Peng Lim,et al.  Finding unusual review patterns using unexpected rules , 2010, CIKM.

[4]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[5]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[6]  Yi Yang,et al.  Learning to Identify Review Spam , 2011, IJCAI.

[7]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[8]  Claire Cardie,et al.  Estimating the prevalence of deception in online review communities , 2012, WWW.

[9]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[10]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[11]  Luca Becchetti,et al.  A reference collection for web spam , 2006, SIGF.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[14]  Kyung Hyan Yoo,et al.  Comparison of Deceptive and Truthful Travel Reviews , 2009, ENTER.

[15]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[16]  Harry Shum,et al.  Twitter Topic Summarization by Ranking Tweets using Social Influence and Content Quality , 2012, COLING.

[17]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[18]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[19]  Claire Cardie,et al.  TopicSpam: a Topic-Model based approach for spam detection , 2013, ACL.

[20]  Philip S. Yu,et al.  Review Graph Based Online Store Review Spammer Detection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[21]  Claire Cardie,et al.  Finding Deceptive Opinion Spam by Any Stretch of the Imagination , 2011, ACL.

[22]  Alan Murray,et al.  Advances in Neural Information Processing Systems 2003 , 2003 .

[23]  Yinglin Wang,et al.  Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining , 2010, ACL.

[24]  Wolfgang Nejdl,et al.  MailRank: using ranking for spam detection , 2005, CIKM '05.

[25]  Derek Greene,et al.  Merging multiple criteria to identify suspicious reviews , 2010, RecSys '10.