Optimizing CNN Model Inference on CPUs
暂无分享,去创建一个
Yida Wang | Mu Li | Yao Wang | Yizhi Liu | Vin Sharma | Ruofei Yu | Mu Li | Yida Wang | Yizhi Liu | Yao Wang | Vin Sharma | Ruofei Yu
[1] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .
[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[4] Donald E. Kirk,et al. Optimal control theory : an introduction , 1970 .
[5] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[8] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .
[9] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[10] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[11] Yida Wang,et al. Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs , 2018, ArXiv.
[12] Matei Zaharia,et al. Optimizing DNN Computation with Relaxed Graph Substitutions , 2019, MLSys.
[13] Kunle Olukotun,et al. DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .
[14] Frédo Durand,et al. Optimizing N-dimensional, winograd-based convolution for manycore CPUs , 2018, PPoPP.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Nir Shavit,et al. Deep Tensor Convolution on Multicores , 2016, ICML.
[17] H. Sebastian Seung,et al. ZNN -- A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-core and Many-Core Shared Memory Machines , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[18] Song Han,et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.
[19] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[20] Ian J. Goodfellow,et al. DLVM: A MODERN COMPILER FRAMEWORK FOR NEURAL NETWORK DSLS , 2017 .
[21] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.
[22] Bernhard Scholz,et al. Nearly Optimal Register Allocation with PBQP , 2006, JMLC.
[23] Yann LeCun,et al. CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.
[24] Alexander Heinecke,et al. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Frédo Durand,et al. FFT Convolutions are Faster than Winograd on Modern CPUs, Here is Why , 2018, ArXiv.
[26] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[27] Alexander Heinecke,et al. Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[28] R. Bellman. A Markovian Decision Process , 1957 .
[29] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[30] Debbie Marr,et al. Hyper-Threading Technology in the NetburstTM Microarchitecture , 2013 .
[31] Kunle Olukotun,et al. The Stanford Hydra CMP , 2000, IEEE Micro.
[32] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[33] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[34] Ninghui Sun,et al. DianNao family , 2016, Commun. ACM.
[35] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[36] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[37] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[38] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .
[39] Yuxiong He,et al. GRNN: Low-Latency and Scalable RNN Inference on GPUs , 2019, EuroSys.
[40] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[41] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[42] D. Scott Cyphers,et al. Intel® nGraphTM , 2018 .
[43] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Yongqiang Wang,et al. Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch , 2014, INTERSPEECH.
[45] Minjia Zhang,et al. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster , 2018, USENIX Annual Technical Conference.
[46] Reza Rooholamini,et al. An Empirical Study of Hyper-Threading in High-Performance Computing Clusters , 2002 .
[47] Keith D. Cooper,et al. Improvements to graph coloring register allocation , 1994, TOPL.
[48] Bertrand A. Maher,et al. Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.
[49] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[51] Tianqi Chen,et al. Efficient Deep Learning Inference on Edge Devices , 2018 .
[52] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.