Exploiting approximate computing for deep learning acceleration
暂无分享,去创建一个
Swagath Venkataramani | Vijayalakshmi Srinivasan | Kailash Gopalakrishnan | Jungwook Choi | Chia-Yu Chen | K. Gopalakrishnan | Chia-Yu Chen | Jungwook Choi | Swagath Venkataramani | V. Srinivasan
[1] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[2] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.
[3] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Vijayalakshmi Srinivasan,et al. Approximate computing: Challenges and opportunities , 2016, 2016 IEEE International Conference on Rebooting Computing (ICRC).
[5] Benjamin Graham,et al. Low-Precision Batch-Normalized Activations , 2017, ArXiv.
[6] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.
[7] Yoshua Bengio,et al. Training deep neural networks with low precision multiplications , 2014 .
[8] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[9] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[10] Wei Zhang,et al. AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training , 2017, AAAI.
[11] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[12] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[13] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[14] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[15] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[16] Eunhyeok Park,et al. Weighted-Entropy-Based Quantization for Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.
[19] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[20] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[21] Kyuyeon Hwang,et al. Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).
[22] Swagath Venkataramani,et al. POSTER: Design Space Exploration for Performance Optimization of Deep Neural Networks on Shared Memory Accelerators , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[23] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[24] Sam Ade Jacobs,et al. Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[25] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[26] Geoffrey Zweig,et al. The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[28] Jian Sun,et al. Deep Learning with Low Precision by Half-Wave Gaussian Quantization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).