Tahoe: tree structure-aware high performance inference engine for decision tree ensemble on GPU

Decision trees are widely used and often assembled as a forest to boost prediction accuracy. However, using decision trees for inference on GPU is challenging, because of irregular memory access patterns and imbalance workloads across threads. This paper proposes Tahoe, a tree structure-aware high performance inference engine for decision tree ensemble. Tahoe rearranges tree nodes to enable efficient and coalesced memory accesses; Tahoe also rearranges trees, such that trees with similar structures are grouped together in memory and assigned to threads in a balanced way. Besides memory access efficiency, we introduce a set of inference strategies, each of which uses shared memory differently and has different implications on reduction overhead. We introduce performance models to guide the selection of the inference strategies for arbitrary forests and data set. Tahoe consistently outperforms the state-of-the-art industry-quality library FIL by 3.82x, 2.59x, and 2.75x on three generations of NVIDIA GPUs (Kepler, Pascal, and Volta), respectively.

[1]  Tanja Zseby,et al.  Empirical evaluation of hash functions for multipoint measurements , 2008, CCRV.

[2]  Martin D. Schatz,et al.  Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications , 2018, ArXiv.

[3]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[4]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[5]  Jian Sun,et al.  Global refinement of random forest , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Souhaib Ben Taieb,et al.  A gradient boosting approach to the Kaggle load forecasting competition , 2014 .

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Ying LU,et al.  Decision tree methods: applications for classification and prediction , 2015, Shanghai archives of psychiatry.

[10]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[11]  James R. Larus,et al.  SIMD parallelization of applications that traverse irregular data structures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[12]  Henrik Boström,et al.  gpuRF and gpuERT: Efficient and Scalable GPU Algorithms for Decision Tree Ensembles , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[13]  Shuang-Hong Yang,et al.  Functional matrix factorizations for cold-start recommendation , 2011, SIGIR.

[14]  Aziz Nasridinov,et al.  Decision tree construction on GPU: ubiquitous parallel computing approach , 2013, Computing.

[15]  Kellie J. Archer,et al.  Empirical characterization of random forest variable importance measures , 2008, Comput. Stat. Data Anal..

[16]  Zhen Xie,et al.  Adaptive neural network-based approximation to accelerate eulerian fluid simulation , 2019, SC.

[17]  Toby Sharp,et al.  Implementing Decision Trees and Forests on a GPU , 2008, ECCV.

[18]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[19]  Zheng Cao,et al.  Modeling Traffic of Big Data Platform for Large Scale Datacenter Networks , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).

[20]  Inderjit S. Dhillon,et al.  Gradient Boosted Decision Trees for High Dimensional Sparse Output , 2017, ICML.

[21]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[22]  Soni Jyoti,et al.  Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction , 2011 .

[23]  Sun Ninghui,et al.  Research on Optimal Performance of Sparse Matrix-Vector Multiplication and Convoulution Using the Probability-Process-Ram Model , 2021 .

[24]  Wei-Yin Loh,et al.  Classification and Regression Tree Methods , 2008 .

[25]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[26]  Gokcen Kestor,et al.  Smart-PGSim: Using Neural Network to Accelerate AC-OPF Power Grid Simulation , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[29]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[30]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[31]  Brian Vinter,et al.  CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.

[32]  D. Geman,et al.  Randomized Inquiries About Shape: An Application to Handwritten Digit Recognition. , 1994 .

[33]  Gagan Agrawal,et al.  A Portable Optimization Engine for Accelerating Irregular Data-Traversal Applications on SIMD Architectures , 2014, TACO.

[34]  Sotiris B. Kotsiantis,et al.  Decision trees: a recent overview , 2011, Artificial Intelligence Review.

[35]  G. Kesteven,et al.  The Coefficient of Variation , 1946, Nature.

[36]  Zhen Xie,et al.  IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication , 2019, ICS.

[37]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[38]  Ke Meng,et al.  A pattern based algorithmic autotuner for graph processing on GPUs , 2019, PPoPP.

[39]  Qun Dai,et al.  A competitive ensemble pruning approach based on cross-validation technique , 2013, Knowl. Based Syst..

[40]  Shereen M. El-Metwally,et al.  Decision tree classifiers for automated medical diagnosis , 2013, Neural Computing and Applications.

[41]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[42]  Ali Uyar,et al.  Measuring firm performance using financial ratios: A decision tree approach , 2013, Expert Syst. Appl..

[43]  Jimmy J. Lin,et al.  No Free Lunch: Brute Force vs. Locality-Sensitive Hashing for Cross-lingual Pairwise Similarity , 2011, SIGIR '11.

[44]  Anna Veronika Dorogush,et al.  CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.