Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection

Graph-based fraud detection approaches have escalated lots of attention recently due to the abundant relational information of graph-structured data, which may be beneficial for the detection of fraudsters. However, the GNN-based algorithms could fare poorly when the label distribution of nodes is heavily skewed, and it is common in sensitive areas such as financial fraud, etc. To remedy the class imbalance problem of graph-based fraud detection, we propose a Pick and Choose Graph Neural Network (PC-GNN for short) for imbalanced supervised learning on graphs. First, nodes and edges are picked with a devised label-balanced sampler to construct sub-graphs for mini-batch training. Next, for each node in the sub-graph, the neighbor candidates are chosen by a proposed neighborhood sampler. Finally, information from the selected neighbors and different relations are aggregated to obtain the final representation of a target node. Experiments on both benchmark and real-world graph-based fraud detection tasks demonstrate that PC-GNN apparently outperforms state-of-the-art baselines.

[1]  Yi Sun,et al.  Temporal high-order proximity aware behavior analysis on Ethereum , 2021, World Wide Web.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  Jianfeng Chi,et al.  Learning to Undersampling for Class Imbalanced Credit Risk Forecasting , 2020, 2020 IEEE International Conference on Data Mining (ICDM).

[4]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yu Huang,et al.  FdGars: Fraudster Detection via Graph Convolutional Networks in Online App Review System , 2019, WWW.

[6]  Zhoujun Li,et al.  Label Aware Graph Convolutional Network - Not All Edges Deserve Your Attention , 2019, ArXiv.

[7]  A. Azzouz 2011 , 2020, City.

[8]  Lingfan Yu,et al.  Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. , 2019 .

[9]  Jianxun Liu,et al.  Multi-Class Imbalanced Graph Convolutional Network Learning , 2020, IJCAI.

[10]  Xiao Wang,et al.  AM-GCN: Adaptive Multi-channel Graph Convolutional Networks , 2020, KDD.

[11]  Qing He,et al.  Alike and Unlike: Resolving Class Imbalance Problem in Financial Credit Risk Assessment , 2020, CIKM.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Xiang Yu,et al.  Feature Transfer Learning for Face Recognition With Under-Represented Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[16]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[17]  Amalia Luque,et al.  The impact of class imbalance in classification performance metrics based on the binary confusion matrix , 2019, Pattern Recognit..

[18]  Philip S. Yu,et al.  Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters , 2020, CIKM.

[19]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[20]  Drazen Prelec,et al.  A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data , 2018, Neurocomputing.

[21]  Xiao Wang,et al.  Dynamic Heterogeneous Information Network Embedding With Meta-Path Based Proximity , 2022, IEEE Transactions on Knowledge and Data Engineering.

[22]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[23]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[25]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[26]  Tom M. Mitchell,et al.  Learning Data Manipulation for Augmentation and Weighting , 2019, NeurIPS.

[27]  Philip S. Yu,et al.  Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection , 2020, SIGIR.

[28]  Daniel P. Robinson,et al.  A Scalable Exemplar-Based Subspace Clustering Algorithm for Class-Imbalanced Data , 2018, ECCV.

[29]  Jennifer Neville,et al.  Using relational knowledge discovery to prevent securities fraud , 2005, KDD '05.

[30]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[31]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[32]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[33]  Jure Leskovec,et al.  From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews , 2013, WWW.

[34]  Themis Palpanas,et al.  GraphAn: Graph-based Subsequence Anomaly Detection , 2020, Proc. VLDB Endow..

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Xiang Ao,et al.  Credit Risk and Limits Forecasting in E-Commerce Consumer Lending Service via Multi-view-aware Mixture-of-experts Nets , 2021, WSDM.

[37]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[39]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[40]  Rayid Ghani,et al.  Data mining to predict and prevent errors in health insurance claims processing , 2010, KDD.

[41]  Li Sun,et al.  Fraud Transactions Detection via Behavior Tree with Local Intention Calibration , 2020, KDD.

[42]  Wenqi Fan,et al.  Global-and-Local Aware Data Generation for the Class Imbalance Problem , 2020, SDM.

[43]  Jun Zhou,et al.  A Semi-Supervised Graph Attentive Network for Financial Fraud Detection , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[44]  Le Song,et al.  Heterogeneous Graph Neural Networks for Malicious Account Detection , 2018, CIKM.

[45]  Tianwen Jiang,et al.  Error-Bounded Graph Anomaly Loss for GNNs , 2020, CIKM.

[46]  Leman Akoglu,et al.  Collective Opinion Spam Detection: Bridging Review Networks and Metadata , 2015, KDD.

[47]  Fei Wu,et al.  Dice Loss for Data-imbalanced NLP Tasks , 2019, ACL.

[48]  Jiayu Tang,et al.  Financial Defaulter Detection on Online Credit Payment via Multi-view Attributed Heterogeneous Information Network , 2020, WWW.

[49]  Kok-Leong Ong,et al.  Fraud detection: A systematic literature review of graph-based anomaly detection approaches , 2020, Decis. Support Syst..

[50]  Dong Li,et al.  Spam Review Detection with Graph Convolutional Networks , 2019, CIKM.