Automobile Insurance Claims Auditing: A Comprehensive Survey on Handling Awry Datasets

Fraud is a very costly criminal activity. Insurance companies face the very challenging task of identifying and preventing fraudulent claims. Just like any big problem in recent years, Machine Learning has been heavily applied to fraud detection in both a supervised and non-supervised manner. But, usually supervised models do not perform well in the presence of awry, asymmetrical Datasets. This paper presents a novel approach for auditing claims in automobile insurance. Our data pipeline consists of preprocessing, feature selection, data balancing, and classification. This robust fraud detection model, built upon existing fraud detection research, gives very promising results compared to state of the art in the industry.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  Suvasini Panigrahi,et al.  Use of Data Mining Techniques for Data Balancing and Fraud Detection in Automobile Insurance Claims , 2019 .

[3]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[4]  Tony R. Martinez,et al.  An instance level analysis of data complexity , 2014, Machine Learning.

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  T. Coleman,et al.  Auto insurance fraud detection using unsupervised spectral ranking for anomaly , 2016 .

[7]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[8]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[9]  Vadlamani Ravi,et al.  A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance , 2015, Eng. Appl. Artif. Intell..

[10]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[11]  Jorma Laurikkala,et al.  Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.

[12]  Alessandra Alaniz Macedo,et al.  A tree-based algorithm for attribute selection , 2017, Applied Intelligence.

[13]  Hien M. Nguyen,et al.  Borderline over-sampling for imbalanced data classification , 2009, Int. J. Knowl. Eng. Soft Data Paradigms.

[14]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[15]  Montserrat Guillen,et al.  Selection Bias and Auditing Policies for Insurance Claims , 2007 .

[16]  Vadlamani Ravi,et al.  Support vector regression based hybrid rule extraction methods for forecasting , 2010, Expert Syst. Appl..