Historical Gradient Boosting Machine

We introduce the Historical Gradient Boosting Machine with the objective of improving the convergence speed of gradient boosting. Our approach is analyzed from the perspective of numerical optimization in function space and considers gradients in previous steps, which have rarely been appreciated by traditional methods. To better exploit the guiding effect of historical gradient information, we incorporate both the accumulated previous gradients and the current gradient into the computation of descent direction in the function space. By fitting to the descent direction given by our algorithm, the weak learner could enjoy the advantages of historical gradients that mitigate the greediness of the steepest descent direction. Experimental results show that our approach improves the convergence speed of gradient boosting without significant decrease in accuracy.

[1]  Hongyuan Zha,et al.  A General Boosting Method and its Application to Learning Ranking Functions for Web Search , 2007, NIPS.

[2]  J. Friedman Stochastic gradient boosting , 2002 .

[3]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[4]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Kilian Q. Weinberger,et al.  Gradient boosted feature selection , 2014, KDD.

[7]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[8]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[9]  Trevor Hastie,et al.  Additive Logistic Regression : a Statistical , 1998 .

[10]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[11]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[12]  Joaquin Quiñonero Candela,et al.  Practical Lessons from Predicting Clicks on Ads at Facebook , 2014, ADKDD'14.

[13]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[14]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[15]  Ping Li,et al.  Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost , 2010, UAI.

[16]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[17]  Yaser S. Abu-Mostafa,et al.  CGBoost: Conjugate Gradient in Function Space , 2003 .

[18]  Stephen Tyree,et al.  Parallel boosted regression trees for web search ranking , 2011, WWW.

[19]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[20]  Jiawei Jiang,et al.  TencentBoost: A Gradient Boosting Tree System with Parameter Server , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[21]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[22]  Kilian Q. Weinberger,et al.  Web-Search Ranking with Initialized Gradient Boosted Regression Trees , 2010, Yahoo! Learning to Rank Challenge.