Bagging k-dependence probabilistic networks: An alternative powerful fraud detection tool

Fraud is a global problem that has required more attention due to an accentuated expansion of modern technology and communication. When statistical techniques are used to detect fraud, whether a fraud detection model is accurate enough in order to provide correct classification of the case as a fraudulent or legitimate is a critical factor. In this context, the concept of bootstrap aggregating (bagging) arises. The basic idea is to generate multiple classifiers by obtaining the predicted values from the adjusted models to several replicated datasets and then combining them into a single predictive classification in order to improve the classification accuracy. In this paper, for the first time, we aim to present a pioneer study of the performance of the discrete and continuous k-dependence probabilistic networks within the context of bagging predictors classification. Via a large simulation study and various real datasets, we discovered that the probabilistic networks are a strong modeling option with high predictive capacity and with a high increment using the bagging procedure when compared to traditional techniques.

[1]  T. Oei,et al.  An evaluation of four serum tests for pregnancy. , 1983, Clinical chemistry.

[2]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[3]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Kazuo J. Ezawa,et al.  Fraud/Uncollectible Debt Detection Using a Bayesian Network Based Learning System: A Rare Binary Outcome with Mixed Data Structures , 1995, UAI.

[6]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[7]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[8]  Luigi Portinale,et al.  Improving the analysis of dependable systems by mapping fault trees into Bayesian networks , 2001, Reliab. Eng. Syst. Saf..

[9]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[10]  Francisco Louzada,et al.  Poly-bagging predictors for classification modelling for credit scoring , 2011, Expert Syst. Appl..

[11]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[12]  Tom Fawcett,et al.  Combining Data Mining and Machine Learning for Effective Fraud Detection , 1997 .

[13]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[14]  Marden Neubert,et al.  Fraud detection in reputation systems in e-markets using logistic regression and stepwise optimization , 2010 .

[15]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[16]  Pedro Larrañaga,et al.  Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive Bayes , 2006, Int. J. Approx. Reason..

[17]  Christian Wolff,et al.  Combining Discriminant Analysis and Neural Networks for Fraud Detection on the Base of Complex Event Processing , 2008 .

[18]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[19]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[20]  Kazuo J. Ezawa,et al.  Constructing Bayesian Networks to Predict Uncollectible Telecommunications Accounts , 1996, IEEE Expert.

[21]  Moninder Singh,et al.  Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management , 1996, ICML.

[22]  Yong Hu,et al.  The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature , 2011, Decis. Support Syst..

[23]  Chang-Tien Lu,et al.  Survey of fraud detection techniques , 2004, IEEE International Conference on Networking, Sensing and Control, 2004.

[24]  Rekha Bhowmik,et al.  Detecting Auto Insurance Fraud by Data Mining Techniques , 2011 .

[25]  Volker Tresp,et al.  Fraud detection in communication networks using neural and probabilistic methods , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[26]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[27]  Weiru Liu,et al.  An algorithm for Bayesian network construction from data , 1997 .

[28]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[29]  D. Hamilton,et al.  A feed forward neural network for classification of bull's-eye myocardial perfusion images , 1995, European Journal of Nuclear Medicine.

[30]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[31]  L. A. Goodman,et al.  Measures of Association for Cross Classifications III: Approximate Sampling Theory , 1963 .

[32]  Aihua Shen,et al.  Application of Classification Models on Credit Card Fraud Detection , 2007, 2007 International Conference on Service Systems and Service Management.

[33]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.