BDT: Gradient Boosted Decision Tables for High Accuracy and Scoring Efficiency

In this paper we present gradient boosted decision tables (BDTs). A d-dimensional decision table is essentially a mapping from a sequence of d boolean tests to a real value in {R}. We propose novel algorithms to fit decision tables. Our thorough empirical study suggests that decision tables are better weak learners in the gradient boosting framework and can improve the accuracy of the boosted ensemble. In addition, we develop an efficient data structure to represent decision tables and propose a novel fast algorithm to improve the scoring efficiency for boosted ensemble of decision tables. Experiments on public classification and regression datasets demonstrate that our method is able to achieve 1.5x to 6x speedups over the boosted regression trees baseline. We complement our experimental evaluation with a bias-variance analysis that explains how different weak models influence the predictive power of the boosted ensemble. Our experiments suggest gradient boosting with randomly backfitted decision tables distinguishes itself as the most accurate method on a number of classification and regression problems. We have deployed a BDT model to LinkedIn news feed system and achieved significant lift on key metrics.

[1]  Raffaele Perego,et al.  QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees , 2015, SIGIR.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[4]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[5]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[6]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[7]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[8]  Raffaele Perego,et al.  Quality versus efficiency in document scoring with learning-to-rank models , 2016, Inf. Process. Manag..

[9]  Ron Kohavi,et al.  Targeting Business Users with Decision Table Classifiers , 1998, KDD.

[10]  Dmitry Yurievich Pavlov,et al.  BagBoo: a scalable hybrid bagging-the-boosting model , 2010, CIKM '10.

[11]  R. Pace,et al.  Sparse spatial autoregressions , 1997 .

[12]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[13]  Stephen Tyree,et al.  Parallel boosted regression trees for web search ranking , 2011, WWW.

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  J. Friedman Stochastic gradient boosting , 2002 .

[16]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[17]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[18]  Sholom M. Weiss,et al.  Rule-based Machine Learning Methods for Functional Prediction , 1995, J. Artif. Intell. Res..

[19]  Ping Li,et al.  Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost , 2010, UAI.

[20]  Cristina V. Lopes,et al.  Bagging gradient-boosted trees for high precision, low variance ranking models , 2011, SIGIR.

[21]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.