Alike and Unlike: Resolving Class Imbalance Problem in Financial Credit Risk Assessment

Financial credit risk assessment serves as the impetus to evaluate the credit admission or potential business failure of customers in order to make early actions prior to the actual financial crisis. It aims to predict the probability that a customer may belong to a high-risk group, which is usually formulated as a binary classification problem. However, due to the lack of high-risk samples, the prevailing models suffer from the severe class-imbalance problem. Oversampling those high-risk users could alleviate this problem but the effect of noise examples is also amplified. In this paper, we propose a novel adversarial data augmentation method to solve the class imbalance problem in financial credit risk assessment. We train a generator for synthetic sample generation with a discriminator to identify real or fake instances. Besides, an auxiliary risk discriminator is trained cooperatively with the generator to assess the credit risk. Experimental results on three real-world datasets demonstrate the effectiveness of the proposed

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[3]  Li Sun,et al.  Fraud Transactions Detection via Behavior Tree with Local Intention Calibration , 2020, KDD.

[4]  Hwee Tou Ng,et al.  Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[5]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Wenqi Fan,et al.  Global-and-Local Aware Data Generation for the Class Imbalance Problem , 2020, SDM.

[8]  Jiayu Tang,et al.  Financial Defaulter Detection on Online Credit Payment via Multi-view Attributed Heterogeneous Information Network , 2020, WWW.

[9]  Ji Feng,et al.  Distributed Deep Forest and its Application to Automatic Detection of Cash-Out Fraud , 2018, ACM Trans. Intell. Syst. Technol..

[10]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Fei Wu,et al.  Dice Loss for Data-imbalanced NLP Tasks , 2019, ACL.

[13]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[15]  Constantine Bekas,et al.  BAGAN: Data Augmentation with Balancing GAN , 2018, ArXiv.

[16]  Ning Chen,et al.  Financial credit risk assessment: a recent review , 2015, Artificial Intelligence Review.