ORFEL: Efficient detection of defamation or illegitimate promotion in online recommendation

What if a successful company starts to receive a torrent of low-valued (one or two stars) recommendations in its mobile apps from multiple users within a short (say one month) period of time? Is it legitimate evidence that the apps have lost in quality, or an intentional plan (via lockstep behavior) to steal market share through defamation? In the case of a systematic attack to one’s reputation, it might not be possible to manually discern between legitimate and fraudulent interaction within the huge universe of possibilities of user-product recommendation. Previous works have focused on this issue, but none of them took into account the context, modeling, and scale that we consider in this paper. Here, we propose the novel method Online-Recommendation Fraud ExcLuder (ORFEL) to detect defamation and/or illegitimate promotion of online products by using vertex-centric asynchronous parallel processing of bipartite (users-products) graphs. With an innovative algorithm, our results demonstrate both efficacy and efficiency – over 95% of potential attacks were detected, and ORFEL was at least two orders of magnitude faster than the state-of-the-art. Over a novel methodology, our main contributions are: (1) a new algorithmic solution; (2) one scalable approach; and (3) a novel context and modeling of the problem, which now addresses both defamation and illegitimate promotion. Our work deals with relevant issues of the Web 2.0, potentially augmenting the credibility of online recommendation to prevent losses to both customers and vendors.

[1]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[2]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[3]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[4]  Wolfgang Nejdl,et al.  Preventing shilling attacks in online recommender systems , 2005, WIDM '05.

[5]  Jian Pei,et al.  On mining cross-graph quasi-cliques , 2005, KDD '05.

[6]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[7]  Fabio A. González,et al.  Visual pattern mining in histology image collections using bag of features , 2011, Artif. Intell. Medicine.

[8]  Jinha Kim,et al.  TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.

[9]  Richard Van Noorden Brazilian citation scheme outed , 2013, Nature.

[10]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[11]  Nikos D. Sidiropoulos,et al.  Co-clustering as multilinear decomposition with sparse latent factors , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Minsuk Kahng,et al.  MMap: Fast billion-scale graph computation on a PC via memory mapping , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[13]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2010, PAKDD.

[14]  Koby Crammer,et al.  A needle in a haystack: local one-class optimization , 2004, ICML.

[15]  Joydeep Ghosh,et al.  Robust one-class clustering using hybrid global and local search , 2005, ICML.

[16]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[17]  René Peeters,et al.  The maximum edge biclique problem is NP-complete , 2003, Discret. Appl. Math..

[18]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[19]  Zheng Chen,et al.  Finding group shilling in recommendation system , 2005, WWW '05.

[20]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[21]  Joemon M. Jose,et al.  Feature Subspace Selection for Efficient Video Retrieval , 2010, MMM.

[22]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[23]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[24]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[25]  Duen Horng Chau,et al.  MMAP: Mining Billion-Scale Graphs on a PC with Fast, Minimalist Approach via Memory Mapping , 2013 .

[26]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[27]  Christos Faloutsos,et al.  RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[28]  Anirban Dasgupta,et al.  Approximation algorithms for co-clustering , 2008, PODS.

[29]  Kewei Tu,et al.  Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative Biclustering , 2008, ICGI.

[30]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[32]  A. Perrig,et al.  The Sybil attack in sensor networks: analysis & defenses , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[33]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[34]  Peter Steenkiste,et al.  Network Anomaly Detection Using Co-clustering , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[35]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.