Surrogate techniques for testing fraud detection algorithms in credit card operations

Banks collect large amount of historical records corresponding to millions of credit cards operations, but, unfortunately, only a small portion, if any, is open access. This is because, e.g., the records include confidential customer data and banks are afraid of public quantitative evidence of existing fraud operations. This paper tackles this problem with the application of surrogate techniques to generate new synthetic credit card data. The quality of the surrogate multivariate data is guaranteed by constraining them to have the same covariance, marginal distributions, and joint distributions as the original multivariate data. The performance of fraud detection algorithms (in terms of receiver operating characteristic (ROC) curves) using a varying proportion of real and surrogate data is tested. We demonstrate the feasibility of surrogates in a real scenario considering very low false alarm and high disproportion between legitimate and fraud operations.

[1]  Patrice Abry,et al.  Using surrogates and optimal transport for synthesis of stationary multivariate series with prescribed covariance function and non-gaussian joint-distribution , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Julien Rabin,et al.  Wasserstein Barycenter and Its Application to Texture Mixing , 2011, SSVM.

[3]  Antonio Soriano,et al.  Automatic credit card fraud detection based on non-linear signal processing , 2012, 2012 IEEE International Carnahan Conference on Security Technology (ICCST).

[4]  Siddhartha Bhattacharyya,et al.  Data mining for credit card fraud: A comparative study , 2011, Decis. Support Syst..

[5]  Addisson Salazar On statistical pattern recognition in independent component analysis mixture modelling , 2012 .

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  John W. Fisher,et al.  ICA Using Spacings Estimates of Entropy , 2003, J. Mach. Learn. Res..

[8]  David G. Stork,et al.  Pattern Classification , 1973 .

[9]  Kate Smith-Miles,et al.  A Comprehensive Survey of Data Mining-based Fraud Detection Research , 2010, ArXiv.

[10]  T. Schreiber,et al.  Surrogate time series , 1999, chao-dyn/9909037.

[11]  Theiler,et al.  Generating surrogate data for time series with several simultaneously measured variables. , 1994, Physical review letters.

[12]  Patrice Abry,et al.  Fast and exact synthesis of stationary multivariate Gaussian time series using circulant embedding , 2011, Signal Process..

[13]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.