TVM : End-to-End Compilation Stack for Deep Learning

[1]  Lane Schwartz,et al.  DLVM: A modern compiler infrastructure for deep learning systems , 2017, ICLR.

[2]  Shoaib Kamil,et al.  The tensor algebra compiler , 2017, Proc. ACM Program. Lang..

[3]  Samuel Madden,et al.  Weld: Rethinking the Interface Between Data-Intensive Applications , 2017, ArXiv.

[4]  Martin Elsman,et al.  Futhark: purely functional GPU-programming with nested parallelism and in-place array updates , 2017, PLDI.

[5]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[6]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[7]  Michel Steuwer,et al.  LIFT: A functional data-parallel IR for high-performance GPU code generation , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[8]  Vivienne Sze,et al.  Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[9]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[10]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[11]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[12]  Elnar Hajiyev,et al.  PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[13]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Geoffrey Zweig,et al.  An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.

[15]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI.

[16]  Francky Catthoor,et al.  Polyhedral parallel code generation for CUDA , 2013, TACO.

[17]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.