Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization

Credit card fraud detection is a very challenging problem because of the specific nature of transaction data and the labeling process. The transaction data are peculiar because they are obtained in a streaming fashion, and they are strongly imbalanced and prone to non-stationarity. The labeling is the outcome of an active learning process, as every day human investigators contact only a small number of cardholders (associated with the riskiest transactions) and obtain the class (fraud or genuine) of the related transactions. An adequate selection of the set of cardholders is therefore crucial for an efficient fraud detection process. In this paper, we present a number of active learning strategies and we investigate their fraud detection accuracies. We compare different criteria (supervised, semi-supervised and unsupervised) to query unlabeled transactions. Finally, we highlight the existence of an exploitation/exploration trade-off for active learning in the context of fraud detection, which has so far been overlooked in the literature.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Longbing Cao,et al.  Effective detection of sophisticated online banking fraud on extremely imbalanced data , 2012, World Wide Web.

[3]  Monique Snoeck,et al.  AFRAID: Fraud detection via active inference in time-evolving social networks , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[4]  William Perrizo,et al.  RDF: a density-based outlier detection method using vertical data representation , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[5]  Gianluca Bontempi,et al.  An Assessment of Streaming Active Learning Strategies for Real-Life Credit Card Fraud Detection , 2017, 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[6]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[7]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[8]  Prateek Jain,et al.  Far-sighted active learning on a budget for image and video recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Alvaro Soto,et al.  Detection of Anomalies in Large Datasets Using an Active Learning Scheme Based on Dirichlet Distributions , 2008, IBERAMIA.

[11]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[12]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[13]  Michael Granitzer,et al.  Sequence classification for credit-card fraud detection , 2018, Expert Syst. Appl..

[14]  Joaquim F. Pinto da Costa,et al.  A Weighted Principal Component Analysis and Its Application to Gene Expression Data , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  C. Lee Giles,et al.  Learning on the border: active learning in imbalanced data classification , 2007, CIKM '07.

[16]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[17]  M. Shyu,et al.  A Novel Anomaly Detection Scheme Based on Principal Component Classifier , 2003 .

[18]  Chris Bingham,et al.  Detection of Emerging Faults on Industrial Gas Turbines Using Extended Gaussian Mixture Models , 2017 .

[19]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[20]  Pourya Shamsolmoali,et al.  Application of Credit Card Fraud Detection: Based on Bagging Ensemble Classifier , 2015 .

[21]  F. J. Arregui,et al.  Burst Detection in Water Networks Using Principal Component Analysis , 2012 .

[22]  Cor J. Veenman,et al.  On Selection Bias with Imbalanced Classes , 2016, DS.

[23]  Mario Fernando Montenegro Campos,et al.  Novelty detection and segmentation based on Gaussian mixture models: A case study in 3D robotic laser mapping , 2013, Robotics Auton. Syst..

[24]  Xiangliang Zhang,et al.  A Novel Intrusion Detection Method Based on Principle Component Analysis in Computer Security , 2004, ISNN.

[25]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[26]  Cesare Alippi,et al.  Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[28]  Masoumeh Zareapoor,et al.  FraudMiner: A Novel Credit Card Fraud Detection Model Based on Frequent Itemset Mining , 2014, TheScientificWorldJournal.

[29]  Neha Sethi,et al.  A Revived Survey of Various Credit Card Fraud Detection Techniques , 2014 .

[30]  Priya Ravindra Shimpi,et al.  Survey on Credit Card Fraud Detection Techniques , 2016 .

[31]  Lior Rokach,et al.  Decision forest: Twenty years of research , 2016, Inf. Fusion.

[32]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[33]  Yi Yang,et al.  Influence of Varnish on Bearing Performance and Vibration of Rotating Machinery , 2017 .

[34]  Ling Chen,et al.  Learning Homophily Couplings from Non-IID Data for Joint Feature Selection and Noise-Resilient Outlier Detection , 2017, IJCAI.

[35]  Siddhartha Bhattacharyya,et al.  Data mining for credit card fraud: A comparative study , 2011, Decis. Support Syst..

[36]  Philip S. Yu,et al.  Active Mining of Data Streams , 2004, SDM.

[37]  Roy E. Welsch,et al.  Anomaly detection via a Gaussian Mixture Model for flight operation and safety monitoring , 2016 .

[38]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[39]  Joni-Kristian Kämäräinen,et al.  Gaussian mixture pdf in one-class classification: computing and utilizing confidence values , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[40]  Ling Chen,et al.  Unsupervised Feature Selection for Outlier Detection by Modelling Hierarchical Value-Feature Couplings , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[41]  D. Hand,et al.  Unsupervised Profiling Methods for Fraud Detection , 2002 .

[42]  Monique Snoeck,et al.  APATE: A novel approach for automated credit card transaction fraud detection using network-based extensions , 2015, Decis. Support Syst..

[43]  Sanjoy Dasgupta,et al.  Two faces of active learning , 2011, Theor. Comput. Sci..

[44]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[45]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[46]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[47]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[48]  Gianluca Bontempi,et al.  Learned lessons in credit card fraud detection from a practitioner perspective , 2014, Expert Syst. Appl..

[49]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[50]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[51]  J. Xie,et al.  Stochastic Semi-supervised Learning on Partially Labeled Imbalanced Data , 2011 .

[52]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[53]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[54]  Ekrem Duman,et al.  A cost-sensitive decision tree approach for fraud detection , 2013, Expert Syst. Appl..

[55]  Geoff Holmes,et al.  Active Learning with Evolving Streaming Data , 2011, ECML/PKDD.

[56]  Gianluca Bontempi,et al.  SCARFF: A scalable framework for streaming credit card fraud detection with spark , 2017, Inf. Fusion.

[57]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[58]  José R. Dorronsoro,et al.  Neural fraud detection in credit card operations , 1997, IEEE Trans. Neural Networks.

[59]  Abhinav Srivastava,et al.  Credit Card Fraud Detection Using Hidden Markov Model , 2008, IEEE Transactions on Dependable and Secure Computing.