iBGP: A Bipartite Graph Propagation Approach for Mobile Advertising Fraud Detection

Online mobile advertising plays a vital financial role in supporting free mobile apps, but detecting malicious apps publishers who generate fraudulent actions on the advertisements hosted on their apps is difficult, since fraudulent traffic often mimics behaviors of legitimate users and evolves rapidly. In this paper, we propose a novel bipartite graph-based propagation approach, iBGP, for mobile apps advertising fraud detection in large advertising system. We exploit the characteristics of mobile advertising user’s behavior and identify two persistent patterns: power law distribution and pertinence and propose an automatic initial score learning algorithm to formulate both concepts to learn the initial scores of non-seed nodes. We propose a weighted graph propagation algorithm to propagate the scores of all nodes in the user-app bipartite graphs until convergence. To extend our approach for large-scale settings, we decompose the objective function of the initial score learning model into separate one-dimensional problems and parallelize the whole approach on an Apache Spark cluster. iBGP was applied on a large synthetic dataset and a large real-world mobile advertising dataset; experiment results demonstrate that iBGP significantly outperforms other popular graph-based propagation methods.

[1]  Danai Koutra,et al.  Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms , 2011, ECML/PKDD.

[2]  Maristella Agosti,et al.  A Theoretical Study of a Generalized Version of Kleinberg’s HITS Algorithm , 2005, Information Retrieval.

[3]  Yiqun Liu,et al.  Fraudulent Support Telephone Number Identification Based on Co-Occurrence Information on the Web , 2014, AAAI.

[4]  Rashmi Raj,et al.  Web Spam Detection with Anti-Trust Rank , 2006, AIRWeb.

[5]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[6]  Xianchao Zhang,et al.  Propagating Both Trust and Distrust with Target Differentiation for Combating Link-Based Web Spam , 2014, TWEB.

[7]  Tariq Rahim Soomro,et al.  Big Data Analysis: Apache Spark Perspective , 2015 .

[8]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[9]  Yiqun Liu,et al.  Search engine click spam detection based on bipartite graph propagation , 2014, WSDM.

[10]  Feng Gao,et al.  Click Fraud Detection: Adversarial Pattern Recognition over 5 Years at Microsoft , 2015 .

[11]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[12]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[13]  Gianluca Stringhini,et al.  The Dark Alleys of Madison Avenue: Understanding Malicious Advertisements , 2014, Internet Measurement Conference.

[14]  Divyakant Agrawal,et al.  Using Association Rules for Fraud Detection in Web Advertising Networks , 2005, VLDB.

[15]  Michel Minoux,et al.  Mathematical Programming , 1986 .

[16]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[17]  Brian D. Davison,et al.  Topical TrustRank: using topicality to combat web spam , 2006, WWW '06.

[18]  Deepayan Chakrabarti,et al.  Joint Inference of Multiple Label Types in Large Networks , 2014, ICML.

[19]  Duen Horng Chau,et al.  Guilt by association: large scale malware detection by mining file-relation graphs , 2014, KDD.

[20]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[21]  Ee-Peng Lim,et al.  Detecting Anomalies in Bipartite Graphs with Mutual Dependency Principles , 2012, 2012 IEEE 12th International Conference on Data Mining.