Accelerating the XGBoost algorithm using GPU computing

8 We present a CUDA based implementation of a decision tree construction algorithm within the gradient boosting library XGBoost. The tree construction algorithm is executed entirely on the GPU and shows high performance with a variety of datasets and settings, including sparse input matrices. Individual boosting iterations are parallelized, combining two approaches. An interleaved approach is used for shallow trees, switching to a more conventional radix sort based approach for larger depths. We show speedups of between 3-6x using a Titan X compared to a 4 core i7 CPU, and 1.2x using a Titan X compared to 2x Xeon CPUs (24 cores). We show that it is possible to process the Higgs dataset (10 million instances, 28 features) entirely within GPU memory. The algorithm is made available as a plug-in within the XGBoost library∗and fully supports all XGBoost features including classification, regression and ranking tasks. 9

[1]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[2]  Guy E. Blelloch,et al.  Prefix sums and their applications , 1990 .

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Nicholas J. Higham,et al.  The Accuracy of Floating Point Summation , 1993, SIAM J. Sci. Comput..

[5]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[6]  Toby Sharp,et al.  Implementing Decision Trees and Forests on a GPU , 2008, ECCV.

[7]  Guo-Heng Luo,et al.  A decision tree using CUDA GPUs , 2011, iiWAS '11.

[8]  A. Grimshaw,et al.  High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..

[9]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[10]  Håkan Grahn,et al.  CudaRF: A CUDA-based implementation of Random Forests , 2011, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA).

[11]  Aziz Nasridinov,et al.  Decision tree construction on GPU: ubiquitous parallel computing approach , 2013, Computing.

[12]  Marco Eilers,et al.  Multireduce and Multiscan on Modern GPUs , 2014 .

[13]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Damjan Strnad,et al.  Parallel construction of classification trees on a GPU , 2016, Concurr. Comput. Pract. Exp..