Credit Card Fraud Detection Using Various Classification and Sampling Techniques: A Comparative Study

With an ascent in the development of web-based business, the utilization of credit cards for internet shopping has expanded significantly. This, in turn, has brought about a great deal of credit card fakes. However, once in a while. Consequently, the execution of effective fraud detection frameworks has turned out to be fundamental for all banks to limit their misfortunes as far as credit card transactions are concerned. Numerous advanced systems have been created to monitor different credit card exchanges in literature. In this way, individuals have been attempting their best to identify the extortion in credit card exchanges as much as they can. Various machine learning techniques have been applied to predict whether a particular transaction is fraudulent or not. The biggest challenge with the techniques is the unavailability of the balanced dataset. Which is due to the nature of the transaction: the fraud transactions are too less when compared to genuine transactions. This work handles the challenge by balancing the dataset. Five machine learning techniques: Random forest, Naive Bayes, Support Vector Machine, K-Nearest Neighbor and Logistic regression were applied on the balanced dataset with different sampling techniques such as Oversampling, Undersampling, Both sampling, ROSE and SMOTE. The performance metric AUC – ROC suggests that logistic regression performs with an accuracy of 97.04 % and precision of 99.99%.

[1]  C. P. Prathibhamol,et al.  Multi label classification based on logistic regression (MLC-LR) , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[2]  Mani Shankar,et al.  Multiclass Text Classification and Analytics for Improving Customer Support Response through different Classifiers , 2018, 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[3]  Strother H. Walker,et al.  Estimation of the probability of an event as a function of several independent variables. , 1967, Biometrika.

[4]  Hien M. Nguyen,et al.  A comparative study on sampling techniques for handling class imbalance in streaming data , 2012, The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems.

[5]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[6]  Masayu Leylia Khodra,et al.  A comparison for handling imbalanced datasets , 2014, 2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA).

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  M. E. Maron,et al.  Automatic Indexing: An Experimental Inquiry , 1961, JACM.

[9]  P Subathra,et al.  Analysis and Performance of Collaborative Filtering and Classification Algorithms , 2015 .

[10]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[11]  V. Dheepa,et al.  Behavior Based Credit Card Fraud Detection Using Support Vector Machines , 2012, SOCO 2012.

[12]  Ravi Nayar,et al.  Logistic regression for Mouth (orotracheal) or Nose (nasotracheal) endotracheal intubation , 2017, 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI).

[13]  Yuwei Zhang,et al.  A new credit card fraud detecting method based on behavior certificate , 2018, 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC).