Detection of Automobile Insurance Fraud Using Feature Selection and Data Mining Techniques

This article presents a novel approach for fraud detection in automobile insurance claims by applying various data mining techniques. Initially, the most relevant attributes are chosen from the original dataset by using an evolutionary algorithm based feature selection method. A test set is then extracted from the selected attribute set and the remaining dataset is subjected to the Possibilistic Fuzzy C-Means (PFCM) clustering technique for the undersampling approach. The 10-fold cross validation method is then used on the balanced dataset for training and validating a group of Weighted Extreme Learning Machine (WELM) classifiers generated from various combinations of WELM parameters. Finally, the test set is applied on the best performing model for classification purpose. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world automobile insurance defraud dataset. Besides, a comparative analysis with another approach justifies the superiority of the proposed system.

[1]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[2]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[3]  Navneet Vidyarthi,et al.  A Fuzzy-Based Algorithm for Auditors to Detect Element of Fraud in Settled Insurance Claims , 2003 .

[4]  Yiqiang Chen,et al.  Weighted extreme learning machine for imbalance learning , 2013, Neurocomputing.

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  Mercedes Ayuso,et al.  A Bayesian dichotomous model with asymmetric link for fraud in insurance , 2008 .

[7]  Weina Wang,et al.  On fuzzy cluster validity indices , 2007, Fuzzy Sets Syst..

[8]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[9]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[10]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[11]  Vadlamani Ravi,et al.  A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance , 2015, Eng. Appl. Artif. Intell..

[12]  T. Coleman,et al.  Auto insurance fraud detection using unsupervised spectral ranking for anomaly , 2016 .

[13]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[14]  Liu Zhixin,et al.  2012 International Conference on Information Management, Innovation Management and Industrial Engineering Insurance Fraud Identification Research Based on Fuzzy Support Vector Machine with Dual Membership , 2022 .

[15]  G. Di Caro,et al.  Ant colony optimization: a new meta-heuristic , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[16]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[17]  Brian R. Gaines,et al.  Induction of ripple-down rules applied to modeling large databases , 1995, Journal of Intelligent Information Systems.

[18]  Mercedes Ayuso,et al.  Detection of Automobile Insurance Fraud with Discrete Choice Models and Misclassified Claims , 2002 .

[19]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[20]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[21]  Prabhat,et al.  Artificial Neural Network , 2018, Encyclopedia of GIS.

[22]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[23]  Vadlamani Ravi,et al.  Detection of financial statement fraud and feature selection using data mining techniques , 2011, Decis. Support Syst..

[24]  Yong Hu,et al.  The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature , 2011, Decis. Support Syst..

[25]  Vadlamani Ravi,et al.  One-class support vector machine based undersampling: Application to churn prediction and insurance fraud detection , 2015, 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).

[26]  Yuh-Jye Lee,et al.  Anomaly Detection via Online Oversampling Principal Component Analysis , 2013, IEEE Transactions on Knowledge and Data Engineering.

[27]  David Jensen,et al.  Prospective Assessment of AI Technologies for Fraud Detection: A Case Study , 1997 .

[28]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[29]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[30]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[31]  Anazida Zainal,et al.  Fraud detection system: A survey , 2016, J. Netw. Comput. Appl..

[32]  Marko Bajec,et al.  An expert system for detecting automobile insurance fraud using social network analysis , 2011, Expert Syst. Appl..

[33]  Montserrat Guillen,et al.  Selection Bias and Auditing Policies for Insurance Claims , 2007 .

[34]  Georges Dionne,et al.  The Role of Memory in Long-Term Contracting with Moral Hazard: Empirical Evidence in Automobile Insurance , 2005 .

[35]  Alex B. McBratney,et al.  Soil pattern recognition with fuzzy-c-means : application to classification and soil-landform interrelationships , 1992 .

[36]  Nitesh V. Chawla,et al.  C4.5 and Imbalanced Data sets: Investigating the eect of sampling method, probabilistic estimate, and decision tree structure , 2003 .