SWIRL: High-performance many-core CPU code generation for deep neural networks
暂无分享,去创建一个
Mary W. Hall | Mary Hall | Anand Venkat | Leonard Truong | Tharindu Rusira | Raj Barik | Anand Venkat | Lenny Truong | Tharindu Rusira Patabandi | R. Barik
[1] John Shalf,et al. SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .
[2] Saman P. Amarasinghe,et al. A Common Runtime for High Performance Data Analysis , 2017, CIDR.
[3] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[4] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .
[5] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[7] David A. Padua,et al. A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.
[8] Prabhat,et al. Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets , 2016, ArXiv.
[9] Yann LeCun,et al. Fast Training of Convolutional Networks through FFTs , 2013, ICLR.
[10] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[11] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.
[12] Laurence Perreault Levasseur,et al. Fast automated analysis of strong gravitational lenses with convolutional neural networks , 2017, Nature.
[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[14] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[15] Geoffrey Zweig,et al. An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.
[16] Francesco De Carlo,et al. Automated correlative segmentation of large Transmission X-ray Microscopy (TXM) tomograms using deep learning , 2018, Materials Characterization.
[17] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Ioannis Mitliagkas,et al. Deep Learning at 15PF : Supervised and Semi-Supervised Classification for Scientific Data , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[19] David A. Padua,et al. Locus: A System and a Language for Program Optimization , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[20] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[21] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.
[22] Ivan Gankevich,et al. Speedup of deep neural network learning on the MIC-architecture , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).
[23] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[24] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[25] Clément Farabet,et al. Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.
[26] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[27] Rong Gu,et al. Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-Core Coprocessor , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[28] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.