Automobile insurance classification ratemaking based on telematics driving data

Abstract Usage-based insurance (UBI), given the development of in-vehicle networking and big data technologies, has received a growing amount of attention in recent years from both insurers and policyholders. The UBI product derives certain driving behavior variables from telematics data, which have stronger causal relationships with accidents and thus effectively improve the pricing accuracy of automobile insurance. This paper mainly investigates the use of extensive driving behavior variables in predicting the risk probability and claim frequency of an insured vehicle. More specifically, logistic regression and four machine learning techniques - support vector machines, random forests, XGBoost, and artificial neural networks - are employed as risk probability models, while Poisson regression as claim frequency model. In addition, aiming at the interpretability requirements of insurance pricing, a data augmentation method of variable binning is adopted to discretize continuous variables and construct tariff classes with significant predictive effects. As a result, our pricing framework can simultaneously improve the interpretability and predictive accuracy of the model, and thus provides a novel solution to implement classification ratemaking for UBI products. The empirical results, based on a dataset from a property and casualty insurance company in China, show the selection of significant variables and the estimation of their specific effects on driving risk, verifying the great potential of driving behavior variables in automobile insurance.

[1]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[2]  Achim Zeileis,et al.  evtree: Evolutionary Learning of Globally Optimal Classification and Regression Trees in R , 2014 .

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[5]  Xianbiao Hu,et al.  The use of context-sensitive insurance telematics data in auto insurance rate making , 2018, Transportation Research Part A: Policy and Practice.

[6]  C S Rajitha,et al.  Artificial Intelligence for Estimation of Future Claim Frequency in Non-Life Insurance , 2017 .

[7]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[8]  Frédéric Thiesse,et al.  Evaluation and aggregation of pay-as-you-drive insurance rate factors: A classification analysis approach , 2013, Decis. Support Syst..

[9]  Jens Perch Nielsen,et al.  Improving automobile insurance ratemaking using telematics: incorporating mileage and driver behaviour data , 2018, Transportation.

[10]  Marc Goovaerts,et al.  Statistical risk-evaluation applied to (belgian) car insurance , 1992 .

[11]  Jongwoo Choi,et al.  Future Automotive Insurance System based on Telematics Technology , 2008, 2008 10th International Conference on Advanced Communication Technology.

[12]  Gerda Claeskens,et al.  Unravelling the predictive power of telematics data in car insurance pricing , 2017 .

[13]  Eleni I. Vlahogianni,et al.  Innovative Insurance Schemes: Pay As/how You Drive , 2016 .

[14]  Edwin P. D. Pednault,et al.  Probabilistic estimation-based data mining for discovering insurance risks , 1999, IEEE Intell. Syst..

[15]  Philippe Baecke,et al.  The value of vehicle telematics data in insurance risk selection processes , 2017, Decis. Support Syst..

[16]  Todd Litman,et al.  DISTANCE-BASED VEHICLE INSURANCE AS A TDM STRATEGY , 1997 .

[17]  Jim Weiss Beginner's Roadmap to Working with Driving Behavior Data , 2012 .

[18]  Feng Guo,et al.  Individual driver risk assessment using naturalistic driving data. , 2013, Accident; analysis and prevention.

[19]  S. Husnjak,et al.  Telematics System in Usage Based Motor Insurance , 2015 .

[20]  Wu He,et al.  Internet of Things in Industries: A Survey , 2014, IEEE Transactions on Industrial Informatics.

[21]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[22]  Alma Cohen,et al.  Testing for Adverse Selection in Insurance Markets , 2009 .

[23]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[24]  Jiming Chen,et al.  Join driving: A smart phone-based driving behavior evaluation system , 2013, 2013 IEEE Global Communications Conference (GLOBECOM).

[25]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  A data driven binning strategy for the construction of insurance tariff classes , 2018 .

[28]  G. Dionne,et al.  Automobile Insurance Ratemaking In The Presence Of Asymmetric Information , 1992 .

[29]  Kili C. Wang,et al.  THE USE OF ANNUAL MILEAGE AS A RATING VARIABLE , 2015, ASTIN Bulletin.

[30]  Leo Guelman,et al.  Gradient boosting trees for auto insurance loss cost modeling and prediction , 2012, Expert Syst. Appl..

[31]  Erran Carmel,et al.  Vehicle Telematics at an Italian Insurer: New Auto Insurance Products and a New Industry Ecosystem , 2012, MIS Q. Executive.

[32]  Yiyang Bian,et al.  Good drivers pay less: A study of usage-based vehicle insurance models , 2018 .

[33]  Katrien Antonio,et al.  Statistical concepts of a priori and a posteriori risk classification in insurance , 2010 .

[34]  A. E. Renshaw,et al.  Modelling the Claims Process in the Presence of Covariates , 1994 .

[35]  Thorsten Staake,et al.  Multivariate exposure modeling of accident risk: Insights from Pay-as-you-drive insurance data , 2014 .

[36]  William B. Fairley,et al.  Pricing Automobile Insurance under Multivariate Classification of Risks: Additive versus Multiplicative , 1979 .

[37]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[38]  Tomer Toledo,et al.  In-vehicle data recorders for monitoring and feedback on drivers' behavior , 2008 .

[39]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[40]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[41]  G. Rejikumar,et al.  A pre-launch exploration of customer acceptance of usage based vehicle insurance policy , 2013 .