论文信息 - TVM : End-to-End Compilation Stack for Deep Learning - 字舞流文

TVM : End-to-End Compilation Stack for Deep Learning

Eddie Q. Yan | Carlos Guestrin | Tianqi Chen | A. Krishnamurthy | T. Moreau | Ziheng Jiang | Haichen Shen | Leyuan Wang | Yuwei Hu | L. Ceze | Leyuan Wang

[1] Lane Schwartz,et al. DLVM: A modern compiler infrastructure for deep learning systems , 2017, ICLR.

[2] Shoaib Kamil,et al. The tensor algebra compiler , 2017, Proc. ACM Program. Lang..

[3] Samuel Madden,et al. Weld: Rethinking the Interface Between Data-Intensive Applications , 2017, ArXiv.

[4] Martin Elsman,et al. Futhark: purely functional GPU-programming with nested parallelism and in-place array updates , 2017, PLDI.

[5] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[6] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[7] Michel Steuwer,et al. LIFT: A functional data-parallel IR for high-performance GPU code generation , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[8] Vivienne Sze,et al. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[9] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[10] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[11] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[12] Elnar Hajiyev,et al. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[13] Jia Wang,et al. DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[14] Geoffrey Zweig,et al. An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.

[15] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI.

[16] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.

[17] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.