Risk prediction in life insurance industry using supervised learning algorithms

Risk assessment is a crucial element in the life insurance business to classify the applicants. Companies perform underwriting process to make decisions on applications and to price policies accordingly. With the increase in the amount of data and advances in data analytics, the underwriting process can be automated for faster processing of applications. This research aims at providing solutions to enhance risk assessment among life insurance firms using predictive analytics. The real world dataset with over hundred attributes (anonymized) has been used to conduct the analysis. The dimensionality reduction has been performed to choose prominent attributes that can improve the prediction power of the models. The data dimension has been reduced by feature selection techniques and feature extraction namely, Correlation-Based Feature Selection (CFS) and Principal Components Analysis (PCA). Machine learning algorithms, namely Multiple Linear Regression, Artificial Neural Network, REPTree and Random Tree classifiers were implemented on the dataset to predict the risk level of applicants. Findings revealed that REPTree algorithm showed the highest performance with the lowest mean absolute error (MAE) value of 1.5285 and lowest root-mean-squared error (RMSE) value of 2.027 for the CFS method, whereas Multiple Linear Regression showed the best performance for the PCA with the lowest MAE and RMSE values of 1.6396 and 2.0659, respectively, as compared to the other models.

[1]  Duncan Fyfe Gillies,et al.  A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data , 2015, Adv. Bioinformatics.

[2]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[3]  Perica Strbac,et al.  Toward optimal feature selection using ranking methods and classification algorithms , 2011 .

[4]  A. Prince TANTAMOUNT TO FRAUD?: EXPLORING NON-DISCLOSURE OF GENETIC INFORMATION IN LIFE INSURANCE APPLICATIONS AS GROUNDS FOR POLICY RESCISSION. , 2016, Health matrix.

[5]  Michal Tkác,et al.  Artificial neural networks in business: Two decades of research , 2016, Appl. Soft Comput..

[6]  D. F. Morrison,et al.  Multivariate Statistical Methods , 1968 .

[7]  Zhenyu He,et al.  Joint sparse principal component analysis , 2017, Pattern Recognit..

[8]  Mishra,et al.  Fundamentals of Life Insurance: Theories and Applications , 2010 .

[9]  Rasit Ata,et al.  Artificial neural networks applications in wind energy systems: a review , 2015 .

[10]  Aman Gupta,et al.  A Map Reduce Hadoop Implementation of Random Tree Algorithm based on Correlation Feature Selection , 2017 .

[11]  Cheng Li,et al.  Little's Test of Missing Completely at Random , 2013 .

[12]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[13]  Swarun Kumar,et al.  LTE radio analytics made easy and accessible , 2015, SIGCOMM 2015.

[14]  Z. Irani,et al.  Critical analysis of Big Data challenges and analytical methods , 2017 .

[15]  K. Umamaheswari,et al.  Role of Data mining in Insurance Industry , 2014 .

[16]  B. Knoppers,et al.  Life insurance: genomic stratification and risk classification , 2013, European Journal of Human Genetics.

[17]  Malin Song,et al.  Customer profitability forecasting using Big Data analytics: A case study of the insurance industry , 2016, Comput. Ind. Eng..

[18]  Y-h. Taguchi,et al.  Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease , 2015, BMC Bioinformatics.

[19]  Junbin Gao,et al.  Image Spam Classification Using Neural Network , 2015, SecureComm.

[20]  Zhenhua Guo,et al.  Two-Dimensional Whitening Reconstruction for Enhancing Robustness of Principal Component Analysis , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  John B. Carlin,et al.  Multiple imputation for missing data in a longitudinal cohort study: a tutorial based on a detailed case study involving imputation of missing outcome data , 2016 .

[22]  Zhu Xue,et al.  Prediction of wind loads on high-rise building using a BP neural network combined with POD , 2017 .

[23]  Chiang Ku Fan,et al.  A Comparison of Underwriting Decision Making Between Telematics-Enabled UBI and Traditional Auto Insurance , 2017 .

[24]  James M. Carson,et al.  Sunk Costs and Screening: Two-Part Tariffs in Life Insurance , 2017 .

[25]  Yalda Mohsenzadeh,et al.  Variational Relevant Sample-Feature Machine: A fully Bayesian approach for embedded feature selection , 2017, Neurocomputing.

[26]  T. Coleman,et al.  Auto insurance fraud detection using unsupervised spectral ranking for anomaly , 2016 .

[27]  Vipin Kumar,et al.  Feature Selection: A literature Review , 2014, Smart Comput. Rev..

[28]  D. Hedengren,et al.  Is There Adverse Selection in Life Insurance Markets? , 2016 .

[29]  Behrouz Minaei-Bidgoli,et al.  Improving Fraud and Abuse Detection in General Physician Claims: A Data Mining Study , 2015, International journal of health policy and management.

[30]  Muhammad Asif,et al.  AUDIO-VISUAL EMOTION CLASSIFICATION USING FILTER AND WRAPPER FEATURE SELECTION APPROACHES , 2016 .

[31]  Arunkumar Chinnaswamy,et al.  Performance Analysis of Classifiers on Filter-Based Feature Selection Approaches on Microarray Data , 2017 .

[32]  Verónica Bolón-Canedo,et al.  A comparison of performance of K-complex classification methods using feature selection , 2016, Inf. Sci..

[33]  Dhruba K. Bhattacharyya,et al.  EFS-MI: an ensemble feature selection method for classification , 2017, Complex & Intelligent Systems.

[34]  Waylon Howard,et al.  Attrition in developmental psychology , 2017 .

[35]  Michael Thiel,et al.  High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models , 2017, PloS one.

[36]  Lei Ma,et al.  A Novel Wrapper Approach for Feature Selection in Object-Based Image Classification Using Polygon-Based Cross-Validation , 2017, IEEE Geoscience and Remote Sensing Letters.

[37]  Abdulhamit Subasi,et al.  Comparison of decision tree algorithms for EMG signal classification using DWT , 2015, Biomed. Signal Process. Control..

[38]  Analysis of the Effect of Variation of Reference Channel on Neuronal Activity for Motor Imagery Electroencephalography Signal , 2016 .

[39]  Ali Ghodsi,et al.  Sparse supervised principal component analysis (SSPCA) for dimension reduction and variable selection , 2017, Eng. Appl. Artif. Intell..

[40]  Hamza Naji,et al.  A New Model in Arabic Text Classification Using BPSO/REP-Tree , 2017 .

[41]  Tsai-Jyh Chen Corporate Reputation and Financial Performance of Life Insurers , 2016 .

[42]  Martin T. Hagan,et al.  Neural network design , 1995 .

[43]  V. Sugumaran,et al.  Fault Diagnostics of a Gearbox via Acoustic Signal using Wavelet Features, J48 Decision Tree and Random Tree Classifier , 2016 .

[44]  Mital Doshi,et al.  CORRELATION BASED FEATURE SELECTION (CFS) TECHNIQUE TO PREDICT STUDENT PERFROMANCE , 2014 .

[45]  Aaron Yelowitz,et al.  Is There Adverse Selection in the Life Insurance Market? Evidence from a Representative Sample of Purchasers , 2014 .

[46]  Joan T. Schmit,et al.  A Model of Underwriting and Post-Loss Test Without Commitment in Competitive Insurance Market , 2016 .

[47]  Heike Hofmann,et al.  Visually Exploring Missing Values in Multivariable Data Using a Graphical User Interface , 2015 .

[48]  Amelie C. Wuppermann,et al.  Private Information in Life Insurance, Annuity, and Health Insurance Markets , 2017 .

[49]  Kevin Lü,et al.  Student performance and time-to-degree analysis by the study of course-taking patterns using J48 decision tree algorithm , 2017 .

[50]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[51]  Sushilkumar Kalmegh,et al.  Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News , 2015 .