WiP: Generative Adversarial Network for Oversampling Data in Credit Card Fraud Detection

In this digital world, numerous credit card-based transactions take place all over the world. Concomitantly, gaps in process flows and technology result in many fraudulent transactions. Owing to the spurt in the number of reported fraudulent transactions, customers and credit card service providers incur significant financial and reputation losses respectively. Therefore, building a powerful fraud detection system is paramount. It is noteworthy that fraud detection datasets, by nature, are highly unbalanced. Consequently, almost all of the supervised classifiers, when built on the unbalanced datasets, yield high false negative rates. But, the extant oversampling methods while reducing the false negatives, increase the false positives. In this paper, we propose a novel data oversampling method using Generative Adversarial Network (GAN). We use GAN and its variant to generate synthetic data of fraudulent transactions. To evaluate the effectiveness of the proposed method, we employ machine learning classifiers on the data balanced by GAN. Our proposed GAN-based oversampling method simultaneously achieved high precision, F1-score and dramatic reduction in the count of false positives compared to the state-of-the-art synthetic data generation based oversampling methods such as Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN) and random oversampling. Moreover, an ablation study involving the oversampling based on the ensemble of SMOTE and GAN/WGAN generated datasets indicated that it is outperformed by the proposed methods in terms of F1 score and false positive count.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Chee Peng Lim,et al.  Credit Card Fraud Detection Using AdaBoost and Majority Voting , 2019, IEEE Access.

[3]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[4]  Claus Aranha,et al.  Data Augmentation Using GANs , 2019, ArXiv.

[5]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[6]  Fernando Bação,et al.  Effective data generation for imbalanced learning using conditional generative adversarial networks , 2018, Expert Syst. Appl..

[7]  Alfredo De Santis,et al.  Using generative adversarial networks for improving classification effectiveness in credit card fraud detection , 2017, Inf. Sci..

[8]  Reid A. Johnson,et al.  Calibrating Probability with Undersampling for Unbalanced Classification , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[9]  Alejandro Mottini,et al.  Airline Passenger Name Record Generation using Generative Adversarial Networks , 2018, ArXiv.

[10]  José Cristóbal Riquelme Santos,et al.  Creation of Synthetic Data with Conditional Generative Adversarial Networks , 2019, SOCO.

[11]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Dilip Singh Sisodia,et al.  Performance evaluation of class balancing techniques for credit card fraud detection , 2017, 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI).