Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning and HPC Workloads
暂无分享,去创建一个
Dhiraj D. Kalamkar | Cristina S. Anderson | Vasimuddin | Denise G. Kutnick | Sasikanth Avancha | E. Georganas | Barukh Ziv | A. Heinecke | Hans Pabst | Abhisek Kundu | Sanchit Misra | Menachem Adelman | Deepti Aggarwal | Alexander Breuer | J. Bruestle | N. Chaudhary | Frank Laub | Vasimuddin Md | Ramanarayan Mohanty | Brian Retford
[1] A. Heinecke,et al. Next-Generation Local Time Stepping for the ADER-DG Finite Element Method , 2022, 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[2] Abhisek Kundu,et al. Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Xiaoyan Liu,et al. The Deep Learning Compiler: A Comprehensive Survey , 2020, IEEE Transactions on Parallel and Distributed Systems.
[4] SC20: International Conference for High Performance Computing, Networking, Storage and Analysis , 2020 .
[5] Optimizing Deep Learning Recommender Systems Training on CPU Cluster Architectures , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Sanchit Misra,et al. Deep Graph Library Optimizations for Intel(R) x86 Architecture , 2020, ArXiv.
[7] Cody Hao Yu,et al. Ansor : Generating High-Performance Tensor Programs for Deep Learning , 2020, OSDI.
[8] Alexander Heinecke,et al. Harnessing Deep Learning via a Single Building Block , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[9] Maithra Raghu,et al. A Survey of Deep Learning for Scientific Discovery , 2020, ArXiv.
[10] Christian Plessl,et al. CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations. , 2020, The Journal of chemical physics.
[11] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[12] Johnny Israeli,et al. AtacWorks: A deep convolutional neural network toolkit for epigenomics , 2019, bioRxiv.
[13] Alfio Lazzaro,et al. DBCSR: A Blocked Sparse Tensor Algebra Library , 2019, PARCO.
[14] Alexander Heinecke,et al. Optimizing Deep Learning RNN Topologies on Intel Architecture , 2019, Supercomput. Front. Innov..
[15] David Cox,et al. Triton: an intermediate language and compiler for tiled neural network computations , 2019, MAPL@PLDI.
[16] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[17] Paul Barham,et al. Machine Learning Systems are Stuck in a Rut , 2019, HotOS.
[18] Tim Zerrell,et al. Stripe: Tensor Compilation via the Nested Polyhedral Model , 2019, ArXiv.
[19] Chris Yakopcic,et al. A State-of-the-Art Survey on Deep Learning Theory and Architectures , 2019, Electronics.
[20] Kaiming He,et al. Group Normalization , 2018, International Journal of Computer Vision.
[21] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[22] Alexander Heinecke,et al. Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[23] Minjia Zhang,et al. DeepCPU: Serving RNN-based Deep Learning Models 10x Faster , 2018, USENIX Annual Technical Conference.
[24] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[25] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[26] Alexander Heinecke,et al. EDGE: Extreme Scale Fused Seismic Simulations with the Discontinuous Galerkin Method , 2017, ISC.
[27] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.
[28] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Abhinav Vishnu,et al. Deep learning for computational chemistry , 2017, J. Comput. Chem..
[30] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Alan Edelman,et al. Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..
[32] Alexander Heinecke,et al. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[33] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[34] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[35] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[36] Heng-Tze Cheng,et al. Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.
[37] Gisbert Schneider,et al. Deep Learning in Drug Discovery , 2016, Molecular informatics.
[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Torsten Hoefler,et al. Sparse Tensor Algebra as a Parallel Programming Model , 2015, ArXiv.
[40] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[41] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[43] John F. Stanton,et al. A massively parallel tensor contraction framework for coupled-cluster computations , 2014, J. Parallel Distributed Comput..
[44] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[45] Evgeny Epifanovsky,et al. New implementation of high‐level correlated methods using a general block tensor library for high‐performance electronic structure calculations , 2013, J. Comput. Chem..
[46] Jinyu Li,et al. Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks. , 2013, ICLR 2013.
[47] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[48] S. Hirata. Tensor Contraction Engine: Abstraction and Automated Parallel Implementation of Configuration-Interaction, Coupled-Cluster, and Many-Body Perturbation Theories , 2003 .
[49] Xorshift RNGs,et al. Xorshift RNGs , 2003 .
[50] Philippe Flajolet,et al. The Number of Registers Required for Evaluating Arithmetic Expressions , 1979, Theor. Comput. Sci..
[51] J. Gibbs. Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundation of Thermodynamics , 1902 .
[52] K. Shadan,et al. Available online: , 2012 .