Refined Weighted Random Forest and Its Application to Credit Card Fraud Detection

Random forest (RF) is widely used in many applications due to good classification performance. However, its voting mechanism assumes that all base classifiers have the same weight. In fact, it is more reasonable that some have relatively high weights while some have relatively low weights because the randomization of bootstrap sampling and attributes selecting cannot guarantee all trees have the same ability of making decision. We mainly focus on the weighted voting mechanism and then propose a novel weighted RF in this paper. Experiments on 6 public datasets illustrate that our method outperforms the RF and another weighted RF. We apply our method to credit card fraud detection and experiments also show that our method is the best.

[1]  Mahmoud Reza Hashemi,et al.  Mining information from credit card time series for timelier fraud detection , 2010, 2010 5th International Symposium on Telecommunications.

[2]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[3]  Eric C. Grunsky,et al.  Predictive lithological mapping of Canada's North using Random Forest classification applied to geophysical and geochemical data , 2015, Comput. Geosci..

[4]  Rahul Johari,et al.  A New Framework for Credit Card Transactions Involving Mutual Authentication between Cardholder and Merchant , 2011, 2011 International Conference on Communication Systems and Network Technologies.

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Changjun Jiang,et al.  Random forest for credit card fraud detection , 2018, 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC).

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  Luigi Barone,et al.  Nature-Inspired Techniques in the Context of Fraud Detection , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Luc Van Gool,et al.  Random Forests for Real Time 3D Face Analysis , 2012, International Journal of Computer Vision.

[10]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[11]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[12]  Salvatore J. Stolfo,et al.  Distributed data mining in credit card fraud detection , 1999, IEEE Intell. Syst..

[13]  Robert R. Freimuth,et al.  A weighted random forests approach to improve predictive performance , 2013, Stat. Anal. Data Min..

[14]  Chungang Yan,et al.  Transaction Fraud Detection Based on Total Order Relation and Behavior Diversity , 2018, IEEE Transactions on Computational Social Systems.

[15]  Orlando Belo,et al.  Usage signatures analysis an alternative method for preventing fraud in E-Commerce applications , 2014, 2014 International Conference on Data Science and Advanced Analytics (DSAA).

[16]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[17]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  Yuwei Zhang,et al.  An approach to class imbalance problem based on stacking and inverse random under sampling methods , 2018, 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC).

[21]  Hongwei Ding,et al.  Trees Weighting Random Forest Method for Classifying High-Dimensional Noisy Data , 2010, 2010 IEEE 7th International Conference on E-Business Engineering.

[22]  Shamik Sural,et al.  BLAST-SSAHA Hybridization for Credit Card Fraud Detection , 2009, IEEE Transactions on Dependable and Secure Computing.

[23]  Jon T. S. Quah,et al.  Real-time credit card fraud detection using computational intelligence , 2008, Expert Syst. Appl..

[24]  Sharath Chandra Guntuku,et al.  Big Data Analytics framework for Peer-to-Peer Botnet detection using Random Forests , 2014, Inf. Sci..

[25]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[26]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[27]  Tung-Shou Chen,et al.  A new binary support vector system for increasing detection rate of credit card fraud , 2006, Int. J. Pattern Recognit. Artif. Intell..

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  Conan C. Albrecht,et al.  Current Trends in Fraud and its Detection , 2008, Inf. Secur. J. A Glob. Perspect..

[30]  Mark S. Ackerman,et al.  Expertise recommender: a flexible recommendation system and architecture , 2000, CSCW '00.

[31]  Abhinav Srivastava,et al.  Credit Card Fraud Detection Using Hidden Markov Model , 2008, IEEE Transactions on Dependable and Secure Computing.

[32]  Tao Li,et al.  Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features , 2016, Knowl. Based Syst..

[33]  Siddhartha Bhattacharyya,et al.  Data mining for credit card fraud: A comparative study , 2011, Decis. Support Syst..