暂无分享,去创建一个
Yisong Yue | Anima Anandkumar | Ming-Yu Liu | Markus Meister | Jiawei Zhao | Jeremy Bernstein | Yisong Yue | Jeremy Bernstein | Anima Anandkumar | M. Meister | Jiawei Zhao | Ming-Yu Liu
[1] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[2] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[3] A. Iwata,et al. An artificial neural network accelerator using general purpose 24 bit floating point digital signal processors , 1989, International 1989 Joint Conference on Neural Networks.
[4] D. Hammerstrom,et al. Characterization of artificial neural network algorithms , 1989, IEEE International Symposium on Circuits and Systems,.
[5] Jenq-Neng Hwang,et al. Finite Precision Error Analysis of Neural Network Hardware Implementations , 1993, IEEE Trans. Computers.
[6] Leonid Khachiyan,et al. A sublinear-time randomized approximation algorithm for matrix games , 1995, Oper. Res. Lett..
[7] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[8] H. Sompolinsky,et al. Chaos in Neuronal Networks with Balanced Excitatory and Inhibitory Activity , 1996, Science.
[9] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[10] Mark C. W. van Rossum,et al. Stable Hebbian Learning from Spike Timing-Dependent Plasticity , 2000, The Journal of Neuroscience.
[11] J. Nadal,et al. What can we learn from synaptic weight distributions? , 2007, Trends in Neurosciences.
[12] Inderjit S. Dhillon,et al. Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..
[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[14] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[15] Y. Loewenstein,et al. Multiplicative Dynamics Underlie the Emergence of the Log-Normal Distribution of Spine Sizes in the Neocortex In Vivo , 2011, The Journal of Neuroscience.
[16] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..
[17] G. Buzsáki,et al. The log-dynamic brain: how skewed distributions affect network operations , 2014, Nature Reviews Neuroscience.
[18] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[19] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[20] T. Sejnowski,et al. Nanoconnectomic upper bound on the variability of synaptic plasticity , 2015, eLife.
[21] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[22] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[23] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[24] Giacomo Indiveri,et al. Rounding Methods for Neural Networks with Low Resolution Synaptic Weights , 2015, ArXiv.
[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[28] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[29] Lin Xu,et al. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.
[30] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[32] Daisuke Miyashita,et al. LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[34] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[35] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[36] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.
[37] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.
[38] Gerd Ascheid,et al. Efficient Hardware Acceleration of CNNs using Logarithmic Data Representation with Arbitrary log-base , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[39] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[40] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[41] Babak Hassibi,et al. Stochastic Mirror Descent on Overparameterized Nonlinear Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[42] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[43] Pradeep Dubey,et al. A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.
[44] Swagath Venkataramani,et al. Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks , 2019, NeurIPS.
[45] Yisong Yue,et al. On the distance between two neural networks and the stability of learning , 2020, NeurIPS.
[46] Suvrit Sra,et al. Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity , 2019, ICLR.
[47] James Demmel,et al. Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.