A Hardware–Software Blueprint for Flexible Deep Learning Specialization
暂无分享,去创建一个
Thierry Moreau | Luis Ceze | Arvind Krishnamurthy | Carlos Guestrin | Jared Roesch | Ziheng Jiang | Eddie Q. Yan | Lianmin Zheng | Luis Vega | Josh Fromm | Tianqi Chen | Carlos Guestrin | Tianqi Chen | A. Krishnamurthy | T. Moreau | Ziheng Jiang | Lianmin Zheng | L. Ceze | Jared Roesch | Josh Fromm | Luis Vega
[1] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[2] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[3] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.
[4] Shaoli Liu,et al. Cambricon: An Instruction Set Architecture for Neural Networks , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[5] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[6] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[7] Lane Schwartz,et al. DLVM: A modern compiler infrastructure for deep learning systems , 2017, ICLR.
[8] James E. Smith,et al. Decoupled access/execute computer architectures , 1984, TOCS.
[9] D. Scott Cyphers,et al. Intel® nGraphTM , 2018 .
[10] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.
[11] Albert Cohen,et al. Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.
[12] Ameet Talwalkar,et al. Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.
[13] Tianqi Chen,et al. Relay: a new IR for machine learning frameworks , 2018, MAPL@PLDI.
[14] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).