Discovery of Ranking Fraud for Mobile Apps

Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up the Apps in the popularity list. Indeed, it becomes more and more frequent for App developers to use shady means, such as inflating their Apps' sales or posting phony App ratings, to commit ranking fraud. While the importance of preventing ranking fraud has been widely recognized, there is limited understanding and research in this area. To this end, in this paper, we provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile Apps. Specifically, we first propose to accurately locate the ranking fraud by mining the active periods, namely leading sessions, of mobile Apps. Such leading sessions can be leveraged for detecting the local anomaly instead of globalanomaly of App rankings. Furthermore, we investigate three types of evidences, i.e., ranking based evidences, rating based evidences and review based evidences, by modeling Apps' ranking, rating and review behaviors through statistical hypotheses tests. In addition, we propose an optimization based aggregation method to integrate all the evidences for fraud detection. Finally, we evaluate the proposed system with real-world App data collected from the iOS App Store for a long time period. In the experiments, we validate the effectiveness of the proposed system, and show the scalability of the detection algorithm as well as some regularity of ranking fraud activities.

[1]  Jian Pei,et al.  A Spamicity Approach to Web Spam Detection , 2008, SDM.

[2]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[3]  Hui Xiong,et al.  Ranking fraud detection for mobile apps: a holistic view , 2013, CIKM.

[4]  Dan Roth,et al.  An Unsupervised Learning Algorithm for Rank Aggregation , 2007, ECML.

[5]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[6]  Maksims Volkovs,et al.  A flexible generative model for preference aggregation , 2012, WWW.

[7]  Junjie Wu,et al.  HySAD: a semi-supervised hybrid shilling attack detector for trustworthy product recommendation , 2012, KDD.

[8]  Kamal Ali,et al.  GetJar mobile application recommendations with very sparse datasets , 2012, KDD.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Hui Xiong,et al.  Mining Personal Context-Aware Preferences for Mobile Users , 2012, 2012 IEEE 12th International Conference on Data Mining.

[11]  C. J. van Rijsbergen,et al.  Investigating the relationship between language model perplexity and IR precision-recall measures , 2003, SIGIR.

[12]  David F. Gleich,et al.  Rank aggregation via nuclear norm minimization , 2011, KDD.

[13]  Guanling Chen,et al.  AppJoy: personalized mobile application discovery , 2011, MobiSys '11.

[14]  Tao Qin,et al.  Supervised rank aggregation , 2007, WWW '07.

[15]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[16]  Dan Roth,et al.  Unsupervised rank aggregation with distance-based models , 2008, ICML '08.

[17]  Abhinav Kumar,et al.  Spotting opinion spammers using behavioral footprints , 2013, KDD.

[18]  Jiawei Han,et al.  Survey on web spam detection: principles and algorithms , 2012, SKDD.

[19]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[20]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[21]  Hui Xiong,et al.  Exploiting enriched contextual information for mobile app classification , 2012, CIKM '12.

[22]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[23]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[25]  Hui Xiong,et al.  A Taxi Driving Fraud Detection System , 2011, 2011 IEEE 11th International Conference on Data Mining.

[26]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.