论文信息 - Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft

The application of deep learning models presents significant improvement to many Microsoft services and products. In this paper, we introduce our experience and methodology of developing and applying the DeepCPU library for serving DL models in production at large scale with remarkable latency improvement and infrastructure cost reduction. We describe two ways to use the library, through customized optimization or framework integration, targeting different scenarios.

[1] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[4] Minjia Zhang,et al. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster , 2018, USENIX Annual Technical Conference.

[5] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[6] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[7] Lingfan Yu,et al. Low latency RNN inference with cellular batching , 2018, EuroSys.

[8] Rudolf Kadlec,et al. Text Understanding with the Attention Sum Reader Network , 2016, ACL.

[9] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[10] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11] Ramesh Govindan,et al. Reducing web latency: the virtue of gentle aggression , 2013, SIGCOMM.

[12] Bruce W. Suter,et al. The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.