Automated Underwriting in Life Insurance: Predictions and Optimisation

Underwriting is an important stage in the life insurance process and is concerned with accepting individuals into an insurance fund and on what terms. It is a tedious and labour-intensive process for both the applicant and the underwriting team. An applicant must fill out a large survey containing thousands of questions about their life. The underwriting team must then process this application and assess the risks posed by the applicant and offer them insurance products as a result. Our work implements and evaluates classical data mining techniques to help automate some aspects of the process to ease the burden on the underwriting team as well as optimise the survey to improve the applicant experience. Logistic Regression, XGBoost and Recursive Feature Elimination are proposed as techniques for the prediction of underwriting outcomes. We conduct experiments on a dataset provided by a leading Australian life insurer and show that our early-stage results are promising and serve as a foundation for further work in this space.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  Guandong Xu,et al.  Towards simplified insurance application via sparse questionnaire optimization , 2017, 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC).

[3]  Mutai K. Joram,et al.  A Knowledge-Based System for Life Insurance Underwriting , 2017 .

[4]  Ljiljana Kašćelan,et al.  A nonparametric data mining approach for risk prediction in car insurance: a case study from the Montenegrin market , 2016 .

[5]  J. Friedman Stochastic gradient boosting , 2002 .

[6]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[7]  Qinghua Hu,et al.  Mixed feature selection based on granulation and approximation , 2008, Knowl. Based Syst..

[8]  V. Rodriguez-Galiano,et al.  Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods. , 2018, The Science of the total environment.

[9]  Qiang Shen,et al.  Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Leo Guelman,et al.  Gradient boosting trees for auto insurance loss cost modeling and prediction , 2012, Expert Syst. Appl..

[11]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[12]  C. Furlanello,et al.  Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products , 2006 .