暂无分享,去创建一个
Yong Chen | Yue Jin | Yao Zhang | Rui Zhao | Yongchao Liu | Teng Teng | Hang Ou | Rui Zhao | Yongchao Liu | Yao Zhang | Yue Jin | Yongqi Chen | Teng Teng | Hang Ou
[1] D. Scott Cyphers,et al. Intel® nGraphTM , 2018 .
[2] Bertrand A. Maher,et al. Glow: Graph Lowering Compiler Techniques for Neural Networks , 2018, ArXiv.
[3] Shoaib Kamil,et al. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code , 2018, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[4] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[5] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[7] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[8] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[9] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[10] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[11] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.
[12] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[13] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[14] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.
[15] H. Howie Huang,et al. Performance Analysis of GPU-Based Convolutional Neural Networks , 2016, 2016 45th International Conference on Parallel Processing (ICPP).
[16] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[17] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[18] Chuang Gan,et al. TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[19] Yida Wang,et al. Optimizing CNN Model Inference on CPUs , 2018, USENIX Annual Technical Conference.
[20] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[21] Lane Schwartz,et al. DLVM: A modern compiler infrastructure for deep learning systems , 2017, ICLR.
[22] David Cox,et al. Triton: an intermediate language and compiler for tiled neural network computations , 2019, MAPL@PLDI.
[23] Yixing Lao,et al. nGraph-HE: a graph compiler for deep learning on homomorphically encrypted data , 2018, IACR Cryptol. ePrint Arch..
[24] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[25] Quoc V. Le,et al. A Hierarchical Model for Device Placement , 2018, ICLR.
[26] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[27] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Alexander Aiken,et al. TASO: optimizing deep learning computation with automatic generation of graph substitutions , 2019, SOSP.
[30] Hariharan Sandanagobalane,et al. Diesel: DSL for linear algebra and neural net computations on GPUs , 2018, MAPL@PLDI.
[31] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.