Forest Packing: Fast, Parallel Decision Forests

Machine learning has an emerging critical role in high-performance computing to modulate simulations, extract knowledge from massive data, and replace numerical models with efficient approximations. Decision forests are a critical tool because they provide insight into model operation that is critical to interpreting learned results. While decision forests are trivially parallelizable, the traversals of tree data structures incur many random memory accesses and are very slow. We present memory packing techniques that reorganize learned forests to minimize cache misses during classification. The resulting layout is hierarchical. At low levels, we pack the nodes of multiple trees into contiguous memory blocks so that each memory access fetches data for multiple trees. At higher levels, we use leaf cardinality to identify the most popular paths through a tree and collocate those paths in cache lines. We extend this layout with out-of-order execution and cache-line prefetching to increase memory throughput. Together, these optimizations increase the performance of classification in ensembles by a factor of four over an optimized C++ implementation and a actor of 50 over a popular R language implementation.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[3]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[4]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  Daniel George,et al.  Deep Learning for Real-time Gravitational Wave Detection and Parameter Estimation with Advanced LIGO Data , 2017, ArXiv.

[9]  J. Shotton,et al.  Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2011 .

[10]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[11]  Laurent Heutte,et al.  Pruning Trees in Random Forests for Minimizing non Detection in Medical imaging , 2016, Handbook of Pattern Recognition and Computer Vision.

[12]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[13]  Victor Guallar,et al.  Adaptive simulations, towards interactive protein-ligand modeling , 2017, Scientific Reports.

[14]  Tianqi Chen,et al.  RABIT : A Reliable Allreduce and Broadcast Interface , 2015 .