Precision Batching: Bitserial Decomposition for Efficient Neural Network Inference on GPUs
暂无分享,去创建一个
Vijay Janapa Reddi | Maximilian Lam | Colby R. Banbury | Zachary Yedidia | Maximilian Lam | V. Reddi | Zachary Yedidia
[1] Thierry Moreau,et al. Automating Generation of Low Precision Deep Learning Operators , 2018, ArXiv.
[2] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[3] Jakub W. Pachocki,et al. Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.
[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[5] Yuval Tassa,et al. DeepMind Control Suite , 2018, ArXiv.
[6] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[7] Patrick Judd,et al. Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[8] Thierry Moreau,et al. Automatic generation of high-performance quantized machine learning kernels , 2020, CGO.
[9] Shuohang Wang,et al. Learning Natural Language Inference with LSTM , 2015, NAACL.
[10] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[11] Zhiru Zhang,et al. Improving Neural Network Quantization without Retraining using Outlier Channel Splitting , 2019, ICML.
[12] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[13] Daisuke Miyashita,et al. LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Hongbin Zha,et al. Alternating Multi-bit Quantization for Recurrent Neural Networks , 2018, ICLR.
[15] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[16] Maximilian Lam,et al. Quantized Reinforcement Learning (QUARL) , 2019, ArXiv.
[17] Yoshua Bengio,et al. Neural Networks with Few Multiplications , 2015, ICLR.
[18] Jeff Johnson,et al. Rethinking floating point for deep learning , 2018, ArXiv.
[19] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.
[20] Carole-Jean Wu,et al. DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[21] Andreas Moshovos,et al. Bit-Pragmatic Deep Neural Network Computing , 2016, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[22] Swagath Venkataramani,et al. PACT: Parameterized Clipping Activation for Quantized Neural Networks , 2018, ArXiv.
[23] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[24] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[25] Hubert Eichner,et al. Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.
[26] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.
[27] Atri Rudra,et al. Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations , 2019, ICML.
[28] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[29] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[30] Alexander Rush,et al. AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference , 2019, ArXiv.
[31] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[32] Magnus Jahre,et al. Streamlined Deployment for Quantized Neural Networks , 2017, ArXiv.
[33] Hadi Esmaeilzadeh,et al. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network , 2017, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[34] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[35] Mayank Bansal,et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.
[36] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..