A Tensor Compiler for Unified Machine Learning Prediction Serving

Machine Learning (ML) adoption in the enterprise requires simpler and more efficient software infrastructure---the bespoke solutions typical in large web companies are simply untenable. Model scoring, the process of obtaining predictions from a trained model over new data, is a primary contributor to infrastructure complexity and cost as models are trained once but used many times. In this paper we propose HUMMINGBIRD, a novel approach to model scoring, which compiles featurization operators and traditional ML models (e.g., decision trees) into a small set of tensor operations. This approach inherently reduces infrastructure complexity and directly leverages existing investments in Neural Network compilers and runtimes to generate efficient computations for both CPU and hardware accelerators. Our performance results are intriguing: despite replacing imperative computations (e.g., tree traversals) with tensor computation abstractions, HUMMINGBIRD is competitive and often outperforms hand-crafted kernels on micro-benchmarks on both CPU and GPU, while enabling seamless end-to-end acceleration of ML pipelines. We have released HUMMINGBIRD as open source.

[1]  Shen Li,et al.  PyTorch distributed , 2020, Proc. VLDB Endow..

[2]  Ping Li,et al.  Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost , 2010, UAI.

[3]  Peter Kraft,et al.  Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference , 2020, MLSys.

[4]  Samuel Madden,et al.  Evaluating End-to-End Optimization for Data Analytics Applications in Weld , 2018, Proc. VLDB Endow..

[5]  Neoklis Polyzotis,et al.  Data Lifecycle Challenges in Production Machine Learning , 2018, SIGMOD Rec..

[6]  Carlo Curino,et al.  Data Science through the looking glass and what we found there , 2019, ArXiv.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[9]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[10]  Yiwen Zhu,et al.  Machine Learning at Microsoft with ML.NET , 2019, KDD.

[11]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[14]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[15]  Toby Sharp,et al.  Implementing Decision Trees and Forests on a GPU , 2008, ECCV.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[18]  Ion Stoica,et al.  InferLine: ML Inference Pipeline Composition Framework , 2018, ArXiv.

[19]  Carlo Curino,et al.  Extending Relational Query Processing with ML Inference , 2019, CIDR.

[20]  Kwanghyun Park,et al.  Froid: Optimization of Imperative Programs in a Relational Database , 2017, Proc. VLDB Endow..

[21]  Carlo Curino,et al.  Cloudy with high chance of DBMS: a 10-year prediction for Enterprise-Grade ML , 2020, CIDR.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[24]  Byung-Gon Chun,et al.  PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems , 2018, OSDI.

[25]  Xin Zhang,et al.  TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[26]  Shoaib Kamil,et al.  The tensor algebra compiler , 2017, Proc. ACM Program. Lang..

[27]  Gustavo Alonso,et al.  Scalable inference of decision tree ensembles: Flexible design for CPU-FPGA platforms , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[28]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .