暂无分享,去创建一个
Dan Alistarh | Daniel M. Roy | Ilia Markov | Ali Ramezani-Kebrya | Fartash Faghri | Iman Tabrizian | Daniel Roy | Fartash Faghri | Dan Alistarh | Ali Ramezani-Kebrya | Iman Tabrizian | I. Markov
[1] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[2] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[3] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[4] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[5] John Langford,et al. Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.
[6] G. Hua,et al. LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks , 2018, ECCV.
[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Tianbao Yang,et al. Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization , 2016, 1604.03257.
[9] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[10] Yaoliang Yu,et al. Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.
[11] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[12] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[13] Jiawei Jiang,et al. Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript , 2020, ICML.
[14] Jaehoon Lee,et al. On Empirical Comparisons of Optimizers for Deep Learning , 2019, ArXiv.
[15] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[16] James L. Flanagan,et al. Adaptive quantization in differential PCM coding of speech , 1973 .
[17] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[18] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[19] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[20] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[21] Marco Canini,et al. Natural Compression for Distributed Deep Learning , 2019, MSML.
[22] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[23] Dan Alistarh,et al. Distributed Mean Estimation with Optimal Error Bounds , 2020, ArXiv.
[24] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[25] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[26] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[27] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[28] Daniel M. Roy,et al. NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization , 2019, ArXiv.
[29] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[30] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[31] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[32] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[33] John C. Duchi,et al. Asynchronous stochastic convex optimization , 2015, 1508.00882.