Research and application of random forest model in mining automobile insurance fraud

Automobile insurance fraud is gradually spreading in the global scope, and mining automobile insurance fraud is more and more concerned by the society. Concerning that the number of samples in the actual automobile insurance claims data is not balance and the amount of data is large, the real data of a automobile insurance company were selected to establish the random forest fraud mining model based on the theory of automobile insurance fraud mining. The data were processed to screen the index and the importance analysis of each input variable to the output variable was obtained. The error of the model was analyzed. Finally the method has been verified by empirical analysis. The empirical results show that: compared with the traditional model, the automobile insurance fraud mining model introducing Random Forest is suitable for large data sets and unbalanced data. It can be better used for the classification and prediction of the automobile insurance claims data and mining fraud rules. And it has the better accuracy and robustness.

[1]  Kuldeep Kumar,et al.  A Comparative Analysis of Decision Trees Vis-à-vis Other Computational Data Mining Techniques in Automotive Insurance Fraud Detection , 2012 .

[2]  U. S. Medicaid,et al.  Outlier based Predictors for Health Insurance Fraud Detection within , 2013 .

[3]  Kuo Chung Lin,et al.  Use of Data Mining Techniques to Detect Medical Fraud in Health Insurance , 2012 .

[4]  Marko Robnik-Sikonja,et al.  Improving Random Forests , 2004, ECML.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Marko Bajec,et al.  An expert system for detecting automobile insurance fraud using social network analysis , 2011, Expert Syst. Appl..

[7]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[8]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[9]  Vadlamani Ravi,et al.  A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance , 2015, Eng. Appl. Artif. Intell..

[10]  G. Anuradha,et al.  Fraud detection in health insurance using data mining techniques , 2015, 2015 International Conference on Communication, Information & Computing Technology (ICCICT).

[11]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[12]  Guido van Capelleveen Outlier based predictors for health insurance frauddetection within U.S. medicaid , 2013 .