Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
暂无分享,去创建一个
Abhisek Kundu | Sanchit Misra | Alexander Heinecke | Sasikanth Avancha | Dhiraj Kalamkar | Evangelos Georganas | Dhiraj D. Kalamkar | Alexander Breuer | Ramanarayan Mohanty | Narendra Chaudhary | Cristina S. Anderson | Vasimuddin | Denise G. Kutnick | Barukh Ziv | Hans Pabst | Menachem Adelman | Cristina Anderson | Sasikanth Avancha | E. Georganas | Barukh Ziv | A. Heinecke | Hans Pabst | Abhisek Kundu | Sanchit Misra | Menachem Adelman | Alexander Breuer | J. Bruestle | N. Chaudhary | Frank Laub | Vasimuddin Md | Ramanarayan Mohanty
[1] Rustam Z. Khaliullin,et al. CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations. , 2020, The Journal of chemical physics.
[2] Freddie D. Witherden,et al. PyFR: An open source framework for solving advection-diffusion type problems on streaming architectures using the flux reconstruction approach , 2013, Comput. Phys. Commun..
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[5] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Johnny Israeli,et al. AtacWorks: A deep convolutional neural network toolkit for epigenomics , 2019, bioRxiv.
[7] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[8] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.
[9] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[10] Paul Barham,et al. Machine Learning Systems are Stuck in a Rut , 2019, HotOS.
[11] Zheng Zhang,et al. Learning Graph Neural Networks with Deep Graph Library , 2020, WWW.
[12] Pradeep Dubey,et al. Petascale High Order Dynamic Rupture Earthquake Simulations on Heterogeneous Supercomputers , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[15] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[16] Abhinav Vishnu,et al. Deep learning for computational chemistry , 2017, J. Comput. Chem..
[17] Philippe Flajolet,et al. The Number of Registers Required for Evaluating Arithmetic Expressions , 1979, Theor. Comput. Sci..
[18] Alfio Lazzaro,et al. DBCSR: A Blocked Sparse Tensor Algebra Library , 2019, PARCO.
[19] Tim Zerrell,et al. Stripe: Tensor Compilation via the Nested Polyhedral Model , 2019, ArXiv.
[20] Sanchit Misra,et al. Deep Graph Library Optimizations for Intel(R) x86 Architecture , 2020, ArXiv.
[21] David Cox,et al. Triton: an intermediate language and compiler for tiled neural network computations , 2019, MAPL@PLDI.
[22] Jinyu Li,et al. Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks. , 2013, ICLR 2013.
[23] Xiaoyan Liu,et al. The Deep Learning Compiler: A Comprehensive Survey , 2020, IEEE Transactions on Parallel and Distributed Systems.
[24] Alexander Heinecke,et al. Optimizing Deep Learning RNN Topologies on Intel Architecture , 2019, Supercomput. Front. Innov..
[25] Alexander Heinecke,et al. Harnessing Deep Learning via a Single Building Block , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[26] Maithra Raghu,et al. A Survey of Deep Learning for Scientific Discovery , 2020, ArXiv.
[27] Chris Yakopcic,et al. A State-of-the-Art Survey on Deep Learning Theory and Architectures , 2019, Electronics.
[28] Kaiming He,et al. Group Normalization , 2018, ECCV.
[29] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[30] Alexander Heinecke,et al. EDGE: Extreme Scale Fused Seismic Simulations with the Discontinuous Galerkin Method , 2017, ISC.
[31] M. Powell,et al. Approximation theory and methods , 1984 .
[32] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[33] Heng-Tze Cheng,et al. Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.
[34] Alexander Heinecke,et al. Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[35] Alexander Heinecke,et al. Optimizing Deep Learning Recommender Systems Training on CPU Cluster Architectures , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Minjia Zhang,et al. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster , 2018, USENIX Annual Technical Conference.
[37] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Alexander Heinecke,et al. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[39] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[40] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[41] Cody Hao Yu,et al. Ansor : Generating High-Performance Tensor Programs for Deep Learning , 2020, OSDI.
[42] Gisbert Schneider,et al. Deep Learning in Drug Discovery , 2016, Molecular informatics.
[43] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[44] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.