A Fraud Detection Model Based on Feature Selection and Undersampling Applied to Web Payment Systems

The volume of electronic transactions has raised a lot in last years, mainly due to the popularization of e-commerce. Since this popularization, we have observed a significant increase in the number of fraud cases, resulting in billions of dollars losses each year worldwide. Therefore, it is important and necessary to develop and apply techniques that can assist in fraud detection in Web transactions. Due to the large amount of data generated in electronic transactions, to find the best set of features is an essential task to identify frauds. Fraud detection is a specific application of anomaly detection, characterized by a large imbalance between the classes (e.g., fraud or non fraud), which can be a detrimental factor for feature selection techniques. In this work we evaluate the behavior and impact of feature selection techniques to detect fraud in a Web Transaction scenario, applying feature selection techniques and performing undersampling in this step. To measure the effectiveness of the feature selection approach we use some state-of-the-art classification techniques to identify frauds, using real data from one of the largest electronic payment system in Latin America. Thus, the fraud detection models comprises a feature selection and classification techniques. To evaluate our results we use metrics of F-Measure and Economic Efficiency. Our results show that the imbalance between the classes reduces the effectiveness of feature selection and the undersampling strategy applied in this task improves the final results. We achieve a very good performance in fraud detection using the proposed methodology, reducing the number of features and presenting financial gains of up to 61% compared to the actual scenario of the company.

[1]  Taghi M. Khoshgoftaar,et al.  Threshold-based feature selection techniques for high-dimensional bioinformatics data , 2012, Network Modeling Analysis in Health Informatics and Bioinformatics.

[2]  Eric R. Ziegel,et al.  An Introduction to Generalized Linear Models , 2002, Technometrics.

[3]  Xue-wen Chen,et al.  FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems , 2008, KDD.

[4]  Vinicius Almendra,et al.  Finding the needle: A risk-based ranking of product listings at online auction sites for non-delivery fraud prediction , 2013, Expert Syst. Appl..

[5]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[6]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[7]  Yong Hu,et al.  The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature , 2011, Decis. Support Syst..

[8]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[9]  Adriano M. Pereira,et al.  Fraud detection in web transactions , 2012, WebMedia.

[10]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[11]  Pankaj Richhariya,et al.  A Survey on Financial Fraud Detection Methodologies , 2012 .

[12]  Abhijit S. Pandya,et al.  Feature Selection for Datasets with Imbalanced Class Distributions , 2010, Int. J. Softw. Eng. Knowl. Eng..

[13]  Yu Zhang,et al.  Trust fraud: A crucial challenge for China's e-commerce market , 2013, Electron. Commer. Res. Appl..

[14]  Jonghun Park,et al.  Pricing fraud detection in online shopping malls using a finite mixture model , 2013, Electron. Commer. Res. Appl..

[15]  Siddhartha Bhattacharyya,et al.  Data mining for credit card fraud: A comparative study , 2011, Decis. Support Syst..

[16]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[17]  Badong Chen,et al.  Chapter 24 - Information Based Learning , 2014 .