A case study of applying boosting naive Bayes to claim fraud diagnosis

We apply the weight of evidence reformulation of AdaBoosted naive Bayes scoring due to Ridgeway et al. (1998) to the problem of diagnosing insurance claim fraud. The method effectively combines the advantages of boosting and the explanatory power of the weight of evidence scoring framework. We present the results of an experimental evaluation with an emphasis on discriminatory power, ranking ability, and calibration of probability estimates. The data to which we apply the method consists of closed personal injury protection (PIP) automobile insurance claims from accidents that occurred in Massachusetts (USA) during 1993 and were previously investigated for suspicion of fraud by domain experts. The data mimic the most commonly occurring data configuration, that is, claim records consisting of information pertaining to several binary fraud indicators. The findings of the study reveal the method to be a valuable contribution to the design of intelligible, accountable, and efficient fraud detection support.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[5]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[6]  D. Titterington,et al.  Comparison of Discrimination Techniques Applied to a Complex Data Set of Head Injured Patients , 1981 .

[7]  David Madigan,et al.  Statistical Analysis of Clinical Variables to Predict the Outcome of Surgical Intervention in Patients with Knee Complaints , 1999 .

[8]  D J Hand,et al.  Statistical methods in diagnosis , 1992, Statistical methods in medical research.

[9]  Charles Elkan,et al.  Boosting and Naive Bayesian learning , 1997 .

[10]  Paul N. Bennett Assessing the Calibration of Naive Bayes Posterior Estimates , 2000 .

[11]  R. Derrig Insurance Fraud , 1996 .

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[13]  Mark R. Wade,et al.  Construction and Assessment of Classification Rules , 1999, Technometrics.

[14]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[15]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[16]  Guido Dedene,et al.  A Comparison of State-of-The-Art Classification Techniques for Expert Automobile Insurance Claim Fraud Detection , 2002 .

[17]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[18]  G. W. Snedecor Statistical Methods , 1964 .

[19]  Herbert I. Weisberg,et al.  QUANTITATIVE METHODS FOR DETECTING FRAUDULENT AUTOMOBILE BODILY INJURY CLAIMS , 1998 .

[20]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[21]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[22]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[23]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[24]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[25]  J. Swets ROC analysis applied to the evaluation of medical imaging techniques. , 1979, Investigative radiology.

[26]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[27]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[28]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[29]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[30]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[31]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[32]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[33]  D J Spiegelhalter,et al.  Probabilistic prediction in patient management and clinical trials. , 1986, Statistics in medicine.

[34]  I.,et al.  Weight of Evidence : A Brief Survey , 2006 .

[35]  Ron Kohavi,et al.  Visualizing the Simple Bayesian Classi er , 1997 .

[36]  Geoffrey I. Webb,et al.  Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees , 1999, ICML.

[37]  Thomas Richardson,et al.  Boosting methodology for regression problems , 1999, AISTATS.

[38]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[39]  Irving John Good,et al.  The Estimation of Probabilities: An Essay on Modern Bayesian Methods , 1965 .

[40]  大西 仁,et al.  Pearl, J. (1988, second printing 1991). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann. , 1994 .

[41]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[42]  Thomas Richardson,et al.  Interpretable Boosted Naïve Bayes Classification , 1998, KDD.

[43]  Ron Kohavi,et al.  Improving simple Bayes , 1997 .

[44]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[45]  J. Copas Plotting p against x , 1983 .

[46]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[47]  D. J. Spiegelhalter,et al.  Statistical and Knowledge‐Based Approaches to Clinical Decision‐Support Systems, with an Application in Gastroenterology , 1984 .