Stochastic Gradient Trees

We present an algorithm for learning decision trees using stochastic gradient information as the source of supervision. In contrast to previous approaches to gradient-based tree learning, our method operates in the incremental learning setting rather than the batch learning setting, and does not make use of soft splits or require the construction of a new tree for every update. We demonstrate how one can apply these decision trees to different problems by changing only the loss function, using classification, regression, and multi-instance learning as example applications. In the experimental evaluation, our method performs similarly to standard incremental classification trees, outperforms state of the art incremental regression trees, and achieves comparable performance with batch multi-instance learning methods.

[1]  Ray W. Grout,et al.  Numerically stable, single-pass, parallel statistics algorithms , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[2]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[3]  Sanjay Ranka,et al.  CLOUDS: A Decision Tree Classifier for Large Datasets , 1998, KDD.

[4]  James R. Foulds,et al.  Speeding Up and Boosting Diverse Density Learning , 2010, Discovery Science.

[5]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[6]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity through Ranking , 2009, IbPRIA.

[7]  Ernesto Pascal Sullo sviluppo delle funzioni σ abeliane dispari di genere 3 , 1889 .

[8]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[9]  João Gama,et al.  Accurate decision trees for mining high-speed data streams , 2003, KDD '03.

[10]  Yongxin Yang,et al.  Deep Neural Decision Trees , 2018, ArXiv.

[11]  Zhi-Hua Zhou,et al.  A Unified View of Multi-Label Performance Measures , 2016, ICML.

[12]  Eibe Frank,et al.  Beyond Trees: Adopting MITI to Learn Rules and Ensemble Classifiers for Multi-Instance Data , 2011, Australasian Conference on Artificial Intelligence.

[13]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.

[14]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[15]  Eibe Frank,et al.  Accelerating the XGBoost algorithm using GPU computing , 2017, PeerJ Comput. Sci..

[16]  Geoff Holmes,et al.  MEKA: A Multi-label/Multi-target Extension to WEKA , 2016, J. Mach. Learn. Res..

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[19]  Geoff Holmes,et al.  Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking , 2010, ACML.

[20]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Online Multi-target regression trees with stacked leaf models , 2019, ArXiv.

[21]  Geoff Holmes,et al.  New Options for Hoeffding Trees , 2007, Australian Conference on Artificial Intelligence.

[22]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[23]  Ji Feng,et al.  Deep Forest: Towards An Alternative to Deep Neural Networks , 2017, IJCAI.

[24]  Peter Kontschieder,et al.  Deep Neural Decision Forests [Winner of the David Marr Prize 2015] , 2015 .

[25]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[26]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[27]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Ashwin Srinivasan,et al.  Multi-instance tree learning , 2005, ICML.

[31]  B. Welford Note on a Method for Calculating Corrected Sums of Squares and Products , 1962 .

[32]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[33]  Ian H. Witten,et al.  Making Better Use of Global Discretization , 1999, ICML.

[34]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[35]  Bernard Zenko,et al.  Speeding-Up Hoeffding-Based Regression Trees With Options , 2011, ICML.

[36]  Geoffrey I. Webb,et al.  Extremely Fast Decision Tree , 2018, KDD.

[37]  James R. Foulds,et al.  A review of multi-instance learning assumptions , 2010, The Knowledge Engineering Review.

[38]  Eibe Frank,et al.  Accelerating the XGBoost algorithm using GPU computing , 2017, PeerJ Comput. Sci..