论文信息 - Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference

Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference

Estimating insurance premia from data is a difficult regression problem for several reasons: the large number of variables, many of which are discrete, and the very peculiar shape of the noise distribution, asymmetric with fat tails, with a large majority zeros and a few unreliable and very large values. We compare several machine learning methods for estimating insurance premia, and test them on a large data base of car insurance policies. We find that function approximation methods that do not optimize a squared loss, like Support Vector Machines regression, do not work well in this context. Compared methods include decision trees and generalized linear models. The best results are obtained with a mixture of experts, which better identifies the least and most risky contracts, and allows to reduce the median premium by charging more to the most risky customers.

[1] R. Bailey,et al. Two Studies in Automobile Insurance Ratemaking , 1960, ASTIN Bulletin.

[2] G. V. Kass. An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[3] P. McCullagh,et al. Generalized Linear Models , 1984 .

[4] Werner A. Stahel,et al. Robust Statistics: The Approach Based on Influence Functions , 1987 .

[5] Peter J. Rousseeuw,et al. Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[6] D. Ruppert. Robust Statistics: The Approach Based on Influence Functions , 1987 .

[7] P. McCullagh,et al. Generalized Linear Models , 1992 .

[8] David Biggs,et al. A method of choosing multiway partitions for classification and decision trees , 1991 .

[9] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[10] Vladimir Vapnik,et al. Statistical learning theory , 1998 .