暂无分享,去创建一个
[1] Lei Liu,et al. Acorns: A Framework for Accelerating Deep Neural Networks with Input Sparsity , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[2] Hariharan Sandanagobalane,et al. Diesel: DSL for linear algebra and neural net computations on GPUs , 2018, MAPL@PLDI.
[3] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[4] Kurt Keutzer,et al. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization , 2019, MLSys.
[5] Lane Schwartz,et al. DLVM: A modern compiler infrastructure for deep learning systems , 2017, ICLR.
[6] Yi Yang,et al. Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] Uday Bondhugula,et al. MLIR: A Compiler Infrastructure for the End of Moore's Law , 2020, ArXiv.
[8] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[9] Torsten Hoefler,et al. StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems , 2020, ArXiv.
[10] Torsten Hoefler,et al. Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures , 2019, SC.
[11] Peter Norvig,et al. The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.
[12] Benoît Meister,et al. Polyhedral Optimization of TensorFlow Computation Graphs , 2017, ESPT/VPA@SC.
[13] Michael Carbin,et al. TIRAMISU: A Polyhedral Compiler for Dense and Sparse Deep Learning , 2020, ArXiv.
[14] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[15] Matthew Johnson,et al. Compiling machine learning programs via high-level tracing , 2018 .
[16] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Alexander Kolesnikov,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.
[18] Chen Liang,et al. Carbon Emissions and Large Neural Network Training , 2021, ArXiv.
[19] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018 .
[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[21] Cody Hao Yu,et al. Ansor : Generating High-Performance Tensor Programs for Deep Learning , 2020, OSDI.
[22] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[23] Andy R. Terrel,et al. SymPy: Symbolic computing in Python , 2017, PeerJ Prepr..
[24] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[25] Alexander Aiken,et al. TASO: optimizing deep learning computation with automatic generation of graph substitutions , 2019, SOSP.
[26] Yinghai Lu,et al. Deep Learning Recommendation Model for Personalization and Recommendation Systems , 2019, ArXiv.
[27] Torsten Hoefler,et al. Accelerating Deep Learning Frameworks with Micro-Batches , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).
[28] Lidong Zhou,et al. Astra: Exploiting Predictability to Optimize Deep Learning , 2019, ASPLOS.
[29] Mary W. Hall,et al. SWIRL: High-performance many-core CPU code generation for deep neural networks , 2019, Int. J. High Perform. Comput. Appl..
[30] Nikoli Dryden,et al. Data Movement Is All You Need: A Case Study on Optimizing Transformers , 2020, MLSys.
[31] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[32] Matei Zaharia,et al. Optimizing DNN Computation with Relaxed Graph Substitutions , 2019, MLSys.
[33] Max Willsey,et al. Equality Saturation for Tensor Graph Superoptimization , 2021, ArXiv.
[34] Chris Cummins,et al. Value Function Based Performance Optimization of Deep Learning Workloads , 2020, ArXiv.
[35] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.
[36] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[37] Bertrand A. Maher,et al. Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.
[38] Paul Barham,et al. Machine Learning Systems are Stuck in a Rut , 2019, HotOS.
[39] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[40] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[42] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[43] Sara Hooker,et al. The hardware lottery , 2020, Commun. ACM.
[44] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[45] Diganta Misra. Mish: A Self Regularized Non-Monotonic Activation Function , 2020, BMVC.
[46] Minjie Wang,et al. FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[47] Dan Alistarh,et al. Taming unbalanced training workloads in deep learning with partial collective operations , 2019, PPoPP.
[48] Hong-Yuan Mark Liao,et al. YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.
[49] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[50] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).