MLP-ARD vs. logistic regression and C4.5 for PIP claim fraud explication

In this paper we demonstrate the explicative capabilities of multilayer perceptron neural networks (MLP) with automatic relevance determination (ARD) weight regularization for personal injury protection (PIP) automobile insurance claim fraud detection. The ARD objective function hyperparameter scheme provides a means for soft input selection as it allows to determine which predictor variables are most informative to the trained MLP. The MLP (hyper-)parameters are trained using MacKay’s (1992;1994) evidence framework for classification implementation of Bayesian learning on a data set of closed PIP insurance claims from accidents that occurred in Massachusetts during 1993. We adhere to an experimental strategy of using the aggregated decision from a ten-fold cross-validated ensemble for robust predictor importance assessment. The findings of MLP-ARD are then compared to the predictor importance evaluation from popular logistic regression and (smoothed+curtailed) C4.5 decision tree classifiers trained on the same data and using an identical experimental setup. The results show good agreement between logistic regression and MLP-ARD, while somewhat less between C4.5 and MLP-ARD.

[1]  Radford M. Neal Assessing Relevance determination methods using DELVE , 1998 .

[2]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[3]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[4]  P. Diaconis,et al.  Computer-Intensive Methods in Statistics , 1983 .

[5]  Tom Heskes,et al.  Input selection based on an ensemble , 2000, Neurocomputing.

[6]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[7]  P. Brockett,et al.  Using Kohonen's Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud , 1998 .

[8]  David J. C. MacKay,et al.  The Evidence Framework Applied to Classification Networks , 1992, Neural Computation.

[9]  A. F. Smith,et al.  Ridge-Type Estimators for Regression Analysis , 1974 .

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[12]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[13]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[14]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[15]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[16]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[17]  Paul W. Munro,et al.  Reducing Variance of Committee Prediction with Resampling Techniques , 1996, Connect. Sci..

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[20]  Yves Grandvalet Least Absolute Shrinkage is Equivalent to Quadratic Penalization , 1998 .

[21]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[22]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[23]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[24]  Wray L. Buntine,et al.  A theory of learning classification rules , 1990 .

[25]  Paul D. Allison,et al.  Logistic Regression Using the SAS System : Theory and Application , 1999 .

[26]  Krzysztof Ostaszewski,et al.  Fuzzy Techniques of Pattern Recognition in Risk and Claim Classification , 1995 .

[27]  Christopher M. Bishop,et al.  Neural networks and machine learning , 1998 .

[28]  Yves Grandvalet,et al.  Outcomes of the Equivalence of Adaptive Ridge with Least Absolute Shrinkage , 1998, NIPS.

[29]  Guido Dedene,et al.  A Comparison of State-of-The-Art Classification Techniques for Expert Automobile Insurance Claim Fraud Detection , 2002 .

[30]  Guido Dedene,et al.  Boosting Naive Bayes for Claim Fraud Diagnosis , 2002, DaWaK.

[31]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[32]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[33]  L. Breiman Random Forests--random Features , 1999 .

[34]  Donald W. Marquardt,et al.  Comment: You Should Standardize the Predictor Variables in Your Regression Models , 1980 .

[35]  Koen Vanhoof,et al.  Credit classification: A comparison of logit models and decision trees , 1998 .

[36]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..