Delta Boosting Machine with Application to General Insurance

ABSTRACT In this article, we introduce Delta Boosting (DB) as a new member of the boosting family. Similar to the popular Gradient Boosting (GB), this new member is presented as a forward stagewise additive model that attempts to reduce the loss at each iteration by sequentially fitting a simple base learner to complement the running predictions. Instead of relying on the negative gradient, as is the case for GB, DB adopts a new measure called delta as the basis. Delta is defined as the loss minimizer at an observation level. We also show that DB is the optimal boosting member for a wide range of loss functions. The optimality is a consequence of DB solving for the split and adjustment simultaneously to maximize loss reduction at each iteration. In addition, we introduce an asymptotic version of DB that works well for all twice-differentiable strictly convex loss functions. This asymptotic behavior does not depend on the number of observations, but rather on a high number of iterations that can be augmented through common regularization techniques. We show that the basis in the asymptotic extension differs from the basis in GB only by a multiple of the second derivative of the log-likelihood. The multiple is considered to be a correction factor, one that corrects the bias toward the observations with high second derivatives in GB. When negative log-likelihood is used as the loss function, this correction can be interpreted as a credibility adjustment for the process variance. Simulation studies and real data application we conducted suggest that DB is a significant improvement over GB. The performance of the asymptotic version is less dramatic, but the improvement is still compelling. Like GB, DB provides a high transparency to users, and we can review the marginal influence of variables through relative importance charts and the partial dependence plots. We can also assess the overall model performance through evaluating the losses, lifts, and double lifts on the holdout sample.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[3]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[4]  X. Sheldon Lin,et al.  Modeling and Evaluating Insurance Losses Via Mixtures of Erlang Distributions , 2010 .

[5]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[6]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[7]  Osamu Watanabe Algorithmic Aspects of Boosting , 2002, Progress in Discovery Science.

[8]  N. Ismail,et al.  Handling Overdispersion with Negative Binomial and Generalized Poisson Regression Models , 2007 .

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[11]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[12]  L. Breiman USING ADAPTIVE BAGGING TO DEBIAS REGRESSIONS , 1999 .

[13]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[14]  B. Jørgensen Exponential Dispersion Models , 1987 .

[15]  Jure Leskovec,et al.  Linear Programming Boosting for Uneven Datasets , 2003, ICML.

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  Leo Guelman,et al.  Gradient boosting trees for auto insurance loss cost modeling and prediction , 2012, Expert Syst. Appl..

[18]  Simon C. K. Lee,et al.  Modeling Dependent Risks with Multivariate Erlang Mixtures , 2009, ASTIN Bulletin.

[19]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[20]  J. Friedman Stochastic gradient boosting , 2002 .

[21]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[22]  Katrien Antonio,et al.  Why High Dimensional Modeling in Actuarial Science ? , 2015 .

[23]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[24]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[25]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[26]  Gunnar Rätsch,et al.  Totally corrective boosting algorithms that maximize the margin , 2006, ICML.

[27]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[28]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT '99.