Horse race analysis in credit card fraud—deep learning, logistic regression, and Gradient Boosted Tree

Fraud detection is an industry where incremental gains in predictive accuracy can have large benefits for banks and customers. Banks adapt models to the novel ways in which “fraudsters” commit credit card fraud. They collect data and engineer new features in order to increase predictive power. This research compares the algorithmic impact on the predictive power across three supervised classification models: logistic regression, gradient boosted trees, and deep learning. This research also explores the benefits of creating features using domain expertise and feature engineering using an autoencoder—an unsupervised feature engineering method. These two methods of feature engineering combined with the direct mapping of the original variables create six different feature sets. Across these feature sets this research compares the aforementioned models. This research concludes that creating features using domain expertise offers a notable improvement in predictive power. Additionally, the autoencoder offers a way to reduce the dimensionality of the data and slightly boost predictive power.

[1]  Leo Guelman,et al.  Gradient boosting trees for auto insurance loss cost modeling and prediction , 2012, Expert Syst. Appl..

[2]  David C. Yen,et al.  Detecting the financial statement fraud: The analysis of the differences between data mining techniques and experts' judgments , 2015, Knowl. Based Syst..

[3]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[4]  Conan C. Albrecht,et al.  MACHINE LEARNING METHODS FOR DETECTING PATTERNS OF MANAGEMENT FRAUD , 2012, Comput. Intell..

[5]  Rok Blagus,et al.  Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models , 2015, BMC Bioinformatics.

[6]  Katherine J. Barker,et al.  Credit card fraud: awareness and prevention , 2008 .

[7]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[8]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[9]  Nishant Agarwal,et al.  Fraud Risk Prediction in Merchant-Bank Relationship using Regression Modeling , 2014 .

[10]  William T. Scherer,et al.  MODELING AND DATA ANALYSIS IN THE CREDIT CARD INDUSTRY: BANKRUPTCY, FRAUD, AND COLLECTIONS , 2002 .

[11]  Angshul Majumdar,et al.  RODEO: Robust DE-aliasing autoencOder for real-time medical image reconstruction , 2017, Pattern Recognit..

[12]  Marco Saerens,et al.  A graph-based, semi-supervised, credit card fraud detection system , 2016, COMPLEX NETWORKS.

[13]  Ethem Alpaydin,et al.  Unsupervised feature extraction with autoencoder trees , 2017, Neurocomputing.