暂无分享,去创建一个
Marco Canini | Peter Richtárik | Atal Narayan Sahu | Samuel Horvath | Chen-Yu Ho | Ludovit Horvath | Peter Richtárik | M. Canini | L. Horvath | Samuel Horváth | Chen-Yu Ho
[1] W. M. Goodall. Television by pulse code modulation , 1951 .
[2] John Langford,et al. Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.
[3] Peter Richt,et al. Distributed Coordinate Descent Method for Learning with Big Data , 2016 .
[4] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[5] Sam Ade Jacobs,et al. Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[6] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[7] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[8] Konstantin Mishchenko,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.
[9] Peter Richt'arik,et al. Nonconvex Variance Reduced Optimization with Arbitrary Sampling , 2018, ICML.
[10] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Peng Jiang,et al. A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication , 2018, NeurIPS.
[12] Alexander J. Smola,et al. AIDE: Fast and Communication Efficient Distributed Optimization , 2016, ArXiv.
[13] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[14] Konstantin Mishchenko,et al. 99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it , 2019 .
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Sebastian U. Stich,et al. Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.
[17] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[18] Gideon S. Mann,et al. Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.
[19] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[20] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[21] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[22] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[23] Sarit Khirirat,et al. Distributed learning with compressed gradients , 2018, 1806.06573.
[24] François Fleuret,et al. Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.
[25] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[26] Peter Richtárik,et al. Fast distributed coordinate descent for non-strongly convex losses , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).
[27] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[28] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[29] H. Robbins. A Stochastic Approximation Method , 1951 .
[30] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[31] Jérôme Malick,et al. Asynchronous Distributed Learning with Sparse Communications and Identification , 2018, ArXiv.
[32] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..
[33] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[34] Hyeontaek Lim,et al. 3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning , 2018, MLSys.
[35] Saibal Mukhopadhyay,et al. On-chip training of recurrent neural networks with limited numerical precision , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[36] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[37] Panos Kalnis,et al. Scaling Distributed Machine Learning with In-Network Aggregation , 2019, NSDI.
[38] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[39] Peter Richtárik,et al. 99% of Parallel Optimization is Inevitably a Waste of Time , 2019, ArXiv.
[40] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[41] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[42] Massih-Reza Amini,et al. Distributed Learning with Sparse Communications by Identification , 2018 .
[43] Lawrence G. Roberts,et al. Picture coding using pseudo-random noise , 1962, IRE Trans. Inf. Theory.
[44] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[45] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[46] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[47] Peter Richtárik,et al. Randomized Distributed Mean Estimation: Accuracy vs. Communication , 2016, Front. Appl. Math. Stat..
[48] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[49] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..