GFD: A Weighted Heterogeneous Graph Embedding Based Approach for Fraud Detection in Mobile Advertising

Online mobile advertising plays a vital role in the mobile app ecosystem. The mobile advertising frauds caused by fraudulent clicks or other actions on advertisements are considered one of the most critical issues in mobile advertising systems. To combat the evolving mobile advertising frauds, machine learning methods have been successfully applied to identify advertising frauds in tabular data, distinguishing suspicious advertising fraud operation from normal one. However, such approaches may suffer from labor-intensive feature engineering and robustness of the detection algorithms, since the online advertising big data and complex fraudulent advertising actions generated by malicious codes, botnets, and click-firms are constantly changing. In this paper, we propose a novel weighted heterogeneous graph embedding and deep learning-based fraud detection approach, namely, GFD, to identify fraudulent apps for mobile advertising. In the proposed GFD approach, (i) we construct a weighted heterogeneous graph to represent behavior patterns between users, mobile apps, and mobile ads and design a weighted metapath to vector algorithm to learn node representations (graph-based features) from the graph; (ii) we use a time window based statistical analysis method to extract intrinsic features (attribute-based features) from the tabular sample data; (iii) we propose a hybrid neural network to fuse graph-based features and attribute-based features for classifying the fraudulent apps from normal apps. The GFD approach was applied on a large real-world mobile advertising dataset, and experiment results demonstrate that the approach significantly outperforms well-known learning methods.

[1]  Mohammad Sohel Rahman,et al.  An ensemble learning based approach for impression fraud detection in mobile advertising , 2018, J. Netw. Comput. Appl..

[2]  Heejo Lee,et al.  PsyBoG: Power spectral density analysis for detecting botnet groups , 2014, 2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE).

[3]  Giannis Tzimas,et al.  Exposing click-fraud using a burst detection algorithm , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[4]  Peter Beling,et al.  Horse race analysis in credit card fraud—deep learning, logistic regression, and Gradient Boosted Tree , 2017, 2017 Systems and Information Engineering Design Symposium (SIEDS).

[5]  David Décary-Hétu,et al.  Follow the traffic: Stopping click fraud by disrupting the value chain , 2016, 2016 14th Annual Conference on Privacy, Security and Trust (PST).

[6]  Charu C. Aggarwal,et al.  NetWalk: A Flexible Deep Embedding Approach for Anomaly Detection in Dynamic Networks , 2018, KDD.

[7]  Ryan Stevens,et al.  MAdFraud: investigating ad fraud in android applications , 2014, MobiSys.

[8]  Hamed Haddadi,et al.  Fighting online click-fraud using bluff ads , 2010, CCRV.

[9]  Chuan Zhou,et al.  FraudNE: a Joint Embedding Approach for Fraud Detection , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[10]  Kianoosh G. Boroojeni,et al.  Deep Learning-based Model to Fight Against Ad Click Fraud , 2019, ACM Southeast Regional Conference.

[11]  Junjie Liang,et al.  iBGP: A Bipartite Graph Propagation Approach for Mobile Advertising Fraud Detection , 2017, Mob. Inf. Syst..

[12]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[13]  Gianluca Stringhini,et al.  The Dark Alleys of Madison Avenue: Understanding Malicious Advertisements , 2014, Internet Measurement Conference.

[14]  Yin Zhang,et al.  Measuring and fingerprinting click-spam in ad networks , 2012, CCRV.

[15]  Ayman I. Kayssi,et al.  Towards a Machine Learning Approach for Detecting Click Fraud in Mobile Advertizing , 2018, 2018 International Conference on Innovations in Information Technology (IIT).

[16]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[17]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[18]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[19]  Yubin Xia,et al.  AdAttester: Secure Online Mobile Advertisement Attestation Using TrustZone , 2015, MobiSys.

[20]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[21]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[22]  Gang Fu,et al.  Deep & Cross Network for Ad Click Predictions , 2017, ADKDD@KDD.

[23]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[24]  S.S. Iyengar,et al.  A Multi-time-scale Time Series Analysis for Click Fraud Forecasting using Binary Labeled Imbalanced Dataset , 2019, 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS).

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  Jie Liu,et al.  DECAF: Detecting and Characterizing Ad Fraud in Mobile Apps , 2014, NSDI.

[27]  Ruy J. G. B. de Queiroz,et al.  A Proposal to Prevent Click-Fraud Using Clickable CAPTCHAs , 2012, 2012 IEEE Sixth International Conference on Software Security and Reliability Companion.

[28]  David Lo,et al.  Detecting click fraud in online advertising: a data mining approach , 2014, J. Mach. Learn. Res..