Use of optimized Fuzzy C-Means clustering and supervised classifiers for automobile insurance fraud detection

Abstract This paper presents a novel hybrid approach for detecting frauds in automobile insurance claims by applying Genetic Algorithm (GA) based Fuzzy C-Means (FCM) clustering and various supervised classifier models. Initially, a test set is extracted from the original insurance dataset. The remaining train set is subjected to the clustering technique for undersampling after generating some meaningful clusters. The test instances are then segregated into genuine, malicious or suspicious classes after subjecting to the clusters. The genuine and fraudulent records are discarded, while the suspicious cases are further analyzed by four classifiers – Decision Tree (DT), Support Vector Machine (SVM), Group Method of Data Handling (GMDH) and Multi-Layer Perceptron (MLP) individually. The 10-fold cross validation method is used throughout the work for training and validation of the models. The efficacy of the proposed system is illustrated by conducting several experiments on a real world automobile insurance dataset.

[1]  Vadlamani Ravi,et al.  A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance , 2015, Eng. Appl. Artif. Intell..

[2]  Ujjwal Maulik,et al.  Towards improving fuzzy clustering using support vector machine: Application to gene expression data , 2009, Pattern Recognit..

[3]  Alex B. McBratney,et al.  Soil pattern recognition with fuzzy-c-means : application to classification and soil-landform interrelationships , 1992 .

[4]  Navneet Vidyarthi,et al.  A Fuzzy-Based Algorithm for Auditors to Detect Element of Fraud in Settled Insurance Claims , 2003 .

[5]  Zhongxing Zhang,et al.  Intrusion Detection Network Based on Fuzzy C-Means and Particle Swarm Optimization , 2016 .

[6]  Youlin Shang,et al.  Semi-supervised outlier detection based on fuzzy rough C-means clustering , 2010, Math. Comput. Simul..

[7]  T. Coleman,et al.  Auto insurance fraud detection using unsupervised spectral ranking for anomaly , 2016 .

[8]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[9]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[10]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[11]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[12]  Jian Ma,et al.  A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering , 2010, Expert Syst. Appl..

[13]  Yuh-Jye Lee,et al.  Anomaly Detection via Online Oversampling Principal Component Analysis , 2013, IEEE Transactions on Knowledge and Data Engineering.

[14]  Marko Bajec,et al.  An expert system for detecting automobile insurance fraud using social network analysis , 2011, Expert Syst. Appl..

[15]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[16]  David Jensen,et al.  Prospective Assessment of AI Technologies for Fraud Detection: A Case Study , 1997 .

[17]  Yong Hu,et al.  The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature , 2011, Decis. Support Syst..

[18]  Liu Zhixin,et al.  2012 International Conference on Information Management, Innovation Management and Industrial Engineering Insurance Fraud Identification Research Based on Fuzzy Support Vector Machine with Dual Membership , 2022 .

[19]  Anazida Zainal,et al.  Fraud detection system: A survey , 2016, J. Netw. Comput. Appl..

[20]  Alina A. von Davier,et al.  Cross-Validation , 2014 .

[21]  Jacek M. Leski,et al.  A time-domain-constrained fuzzy clustering method and its application to signal analysis , 2005, Fuzzy Sets Syst..

[22]  Dong-Chul Park,et al.  Intuitive Fuzzy C-Means Algorithm for MRI Segmentation , 2010, 2010 International Conference on Information Science and Applications.

[23]  Shamik Sural,et al.  Two-stage database intrusion detection by combining multiple evidence and belief update , 2013, Inf. Syst. Frontiers.

[24]  A. E. Eiben,et al.  Genetic algorithms with multi-parent recombination , 1994, PPSN.

[25]  James C. Bezdek,et al.  Optimization of fuzzy clustering criteria using genetic algorithms , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[26]  Vadlamani Ravi,et al.  One-class support vector machine based undersampling: Application to churn prediction and insurance fraud detection , 2015, 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC).

[27]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[28]  Montserrat Guillen,et al.  Selection Bias and Auditing Policies for Insurance Claims , 2007 .

[29]  Mercedes Ayuso,et al.  A Bayesian dichotomous model with asymmetric link for fraud in insurance , 2008 .

[30]  S. Vanduffel,et al.  Mean-Variance Optimal Portfolios in the Presence of a Benchmark with Applications to Fraud Detection , 2013 .

[31]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[32]  Wei Xu,et al.  Random Rough Subspace Based Neural Network Ensemble for Insurance Fraud Detection , 2011, 2011 Fourth International Joint Conference on Computational Sciences and Optimization.