Evaluating the benefits of using proactive transformed-domain-based techniques in fraud detection tasks

Abstract The exponential growth in the number of E-commerce transactions indicates a radical change in the way people buy and sell goods and services, a new opportunity offered by a huge global market, where they may choose sellers or buyers on the basis of multiple criteria (e.g., economic, logistical, ethical, sustainability, etc.), without being forced to use the traditional brick-and-mortar criterion. If, on the one hand, such a scenario offers an enormous control to people, both at private and corporate level, allowing them to filter their needs by adopting a large range of criteria, on the other hand, it has contributed to the growth of fraud cases related to the involved electronic instruments of payment, such as credit cards. The Big Data Information Security for Sustainability is a research branch aimed to face these issues in relation to the potential implications in the field of sustainability, proposing effective solutions to design safe environments in which the people can operate and by exploiting the benefits related to new technologies. The fraud detection systems are a significant example of such solutions, although the techniques adopted by them are typically based on retroactive strategies, which are incapable of preventing fraudulent events. In this perspective, this paper aims to investigate the benefits related to the adoption of proactive fraud detection strategies, instead of the canonical retroactive ones, theorizing those solutions that can lead toward practical effective implementations. We evaluate two previously experimented novel proactive strategies, one based on the Fourier transform, and one based on the Wavelet transform, which are used in order to move the data (i.e., financial transactions) into a new domain, where they are analyzed and an evaluation model is defined. Such strategies allow a fraud detection system to operate by using a proactive approach, since they do not exploit previous fraudulent transactions, overcoming some important problems that reduce the effectiveness of the canonical retroactive state-of-the-art solutions. Potential benefits and limitations of the proposed proactive approach have been evaluated in a real-world credit card fraud detection scenario, by comparing its performance to that of one of the most used and performing retroactive state-of-the-art approaches (i.e. Random Forests).

[1]  Roberto Saia,et al.  Unbalanced Data Classification in Fraud Detection by Introducing a Multidimensional Space Analysis , 2018, IoTBDS.

[2]  A. J. Hoffman,et al.  Artificial Intelligence based Fraud Agent to Identify Supply Chain Irregularities , 2005, Artificial Intelligence and Applications.

[3]  Zhiyong Peng,et al.  From Big Data to Big Data Mining: Challenges, Issues, and Opportunities , 2013, DASFAA Workshops.

[4]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[5]  D. Hand,et al.  Scorecard construction with unbalanced class sizes , 2003 .

[6]  Spyros Kokolakis,et al.  Privacy attitudes and privacy behaviour: A review of current research on the privacy paradox phenomenon , 2017, Comput. Secur..

[7]  Reza Ebrahimi Atani,et al.  A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective , 2016, ArXiv.

[8]  Natalia G. Miloslavskaya,et al.  Survey of Big Data Information Security , 2016, 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW).

[9]  Ekrem Duman,et al.  A cost-sensitive decision tree approach for fraud detection , 2013, Expert Syst. Appl..

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Pedro Sampaio,et al.  Towards a Proactive Fraud Management Framework for Financial Data Streams , 2007 .

[13]  Pei-Chann Chang,et al.  Evolving and clustering fuzzy decision tree for financial time series data forecasting , 2009, Expert Syst. Appl..

[14]  Md. Rafiqul Islam,et al.  A survey of anomaly detection techniques in financial domain , 2016, Future Gener. Comput. Syst..

[15]  Martin Vetterli,et al.  Fast Fourier transforms: a tutorial review and a state of the art , 1990 .

[16]  Pedro R. Falcone Sampaio,et al.  A survey of signature based methods for financial fraud detection , 2009, Comput. Secur..

[17]  Victor I. Chang,et al.  Towards data analysis for weather cloud computing , 2017, Knowl. Based Syst..

[18]  Siddhartha Bhattacharyya,et al.  Data mining for credit card fraud: A comparative study , 2011, Decis. Support Syst..

[19]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[20]  José Francisco Martínez Trinidad,et al.  Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases , 2016, Neurocomputing.

[21]  Sven F. Crone,et al.  Instance sampling in credit scoring: An empirical study of sample size and balancing , 2012 .

[22]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Lech J. Janczewski,et al.  Technological, Organizational and Environmental Security and Privacy Issues of Big Data: A Literature Review , 2016, CENTERIS/ProjMAN/HCist.

[24]  Z. Rezaee,et al.  Financial Statement Fraud: Prevention and Detection , 2002 .

[25]  Tiejun Zhao,et al.  Self-adaptive statistical process control for anomaly detection in time series , 2016, Expert Syst. Appl..

[26]  George Karabatis,et al.  Discrete wavelet transform-based time series analysis and mining , 2011, CSUR.

[27]  José Salvador Sánchez,et al.  On the suitability of resampling techniques for the class imbalance problem in credit scoring , 2013, J. Oper. Res. Soc..

[28]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[29]  Steven W. Smith,et al.  The Scientist and Engineer's Guide to Digital Signal Processing , 1997 .

[30]  Ludovico Boratto,et al.  Modeling the Preferences of a Group of Users Detected by Clustering: a Group Recommendation Case-Study , 2014, WIMS '14.

[31]  J. Priestley,et al.  An Analysis of Accuracy using Logistic Regression and Time Series , 2016 .

[32]  Lalu Banoth,et al.  A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection , 2017 .

[33]  Jorge Bernardino,et al.  Business Intelligence for E-commerce: Survey and Research Directions , 2017, WorldCIST.

[34]  L. Toledo-Pereyra Trust , 2006, Mediation Behaviour.

[35]  Eamonn J. Keogh A decade of progress in indexing and mining large time series databases , 2006, VLDB.

[36]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[37]  Roberto Saia A Discrete Wavelet Transform Approach to Fraud Detection , 2017, NSS.

[38]  Roberto Saia,et al.  Multiple behavioral models: A Divide and Conquer strategy to fraud detection in financial data streams , 2015, 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K).

[39]  Ludovico Boratto,et al.  Groups Identification and Individual Recommendations in Group Recommendation Algorithms , 2010, PRSAT@RecSys.

[40]  Adriano M. Pereira,et al.  Using genetic programming to detect fraud in electronic transactions , 2013, WebMedia.

[41]  Conan C. Albrecht,et al.  MACHINE LEARNING METHODS FOR DETECTING PATTERNS OF MANAGEMENT FRAUD , 2012, Comput. Intell..

[42]  Pervaiz Alam,et al.  Application of Fuzzy Logic Fraud Detection , 2005, Encyclopedia of Information Science and Technology.

[43]  Weili. Ong,et al.  Real time credit card fraud detection using computational intelligence , 2011 .

[44]  Roberto Saia,et al.  Evaluating Credit Card Transactions in the Frequency Domain for a Proactive Fraud Detection Approach , 2017, SECRYPT.

[45]  Reid A. Johnson,et al.  Calibrating Probability with Undersampling for Unbalanced Classification , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[46]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[47]  Foster J. Provost,et al.  Inactive learning?: difficulties employing active learning in practice , 2011, SKDD.

[48]  Todd R. Ogden,et al.  Wavelet Methods for Time Series Analysis , 2002 .

[49]  Victor I. Chang,et al.  Applicability of Big Data Techniques to Smart Cities Deployments , 2017, IEEE Transactions on Industrial Informatics.

[50]  Javier López,et al.  Trust, Privacy and Security in E-Business: Requirements and Solutions , 2005, Panhellenic Conference on Informatics.

[51]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[52]  Anazida Zainal,et al.  Fraud detection system: A survey , 2016, J. Netw. Comput. Appl..

[53]  Arie Segev,et al.  Data manipulation in heterogeneous databases , 1991, SGMD.

[54]  B. Reiser,et al.  Estimation of the area under the ROC curve , 2002, Statistics in medicine.

[55]  Narciso Cerpa,et al.  Data Mining Prototype for Detecting E-Commerce Fraud , 2001, ECIS.

[56]  Christos Faloutsos,et al.  BIRDNEST: Bayesian Inference for Ratings-Fraud Detection , 2015, SDM.

[57]  Joseph T. Wells Corporate Fraud Handbook: Prevention and Detection , 2004 .

[58]  Roberto Saia,et al.  A Frequency-domain-based Pattern Mining for Credit Card Fraud Detection , 2017, IoTBDS.

[59]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[60]  Wei Chu,et al.  A machine-learned proactive moderation system for auction fraud detection , 2011, CIKM '11.