暂无分享,去创建一个
[1] Anit Kumar Sahu,et al. MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling , 2019, 2019 Sixth Indian Control Conference (ICC).
[2] Ohad Shamir,et al. Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.
[3] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[4] Andrew Gelman,et al. Handbook of Markov Chain Monte Carlo , 2011 .
[5] Eric Balkanski,et al. Parallelization does not Accelerate Convex Optimization: Adaptivity Lower Bounds for Non-smooth Convex Minimization , 2018, ArXiv.
[6] Yair Carmon,et al. Lower bounds for finding stationary points I , 2017, Mathematical Programming.
[7] Quanquan Gu,et al. Lower Bounds for Smooth Nonconvex Finite-Sum Optimization , 2019, ICML.
[8] Stephen P. Boyd,et al. Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.
[9] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[10] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[11] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[12] Aditya Sane,et al. Machine learning for predictive maintenance of industrial machines using IoT sensor data , 2017, 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS).
[13] Léon Bottou,et al. A Lower Bound for the Optimization of Finite Sums , 2014, ICML.
[14] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[15] Rong Jin,et al. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.
[16] Laurent Massoulié,et al. Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.
[17] N. U. Prabhu,et al. Stochastic Processes and Their Applications , 1999 .
[18] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[19] Ohad Shamir,et al. Oracle Complexity of Second-Order Methods for Finite-Sum Problems , 2016, ICML.
[20] Xin Yuan,et al. Bandwidth optimal all-reduce algorithms for clusters of workstations , 2009, J. Parallel Distributed Comput..
[21] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.
[22] Michael G. Rabbat,et al. Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.
[23] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[24] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[25] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.
[26] Leonidas Georgopoulos,et al. Definitive Consensus for Distributed Data Inference , 2011 .
[27] Stephen P. Boyd,et al. Gossip algorithms: design, analysis and applications , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..
[28] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Jiaqi Zhang,et al. Asynchronous Decentralized Optimization in Directed Networks , 2019, ArXiv.
[30] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[31] Ludovic Dos Santos,et al. Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning , 2019, NeurIPS.
[32] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[33] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[34] Leonard J. Schulman,et al. On matrix factorization and scheduling for finite-time average-consensus , 2010 .
[35] Christopher De Sa,et al. MixML: A Unified Analysis of Weakly Consistent Parallel Learning , 2020, ArXiv.
[36] Martin Jaggi,et al. Decentralized Deep Learning with Arbitrary Communication Compression , 2019, ICLR.
[37] Martin Jaggi,et al. COLA: Decentralized Linear Learning , 2018, NeurIPS.
[38] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[39] Nathan Srebro,et al. Lower Bounds for Non-Convex Stochastic Optimization , 2019, ArXiv.
[40] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[41] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[42] Zeyuan Allen-Zhu,et al. How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.
[43] Raphaël M. Jungers,et al. Graph diameter, eigenvalues, and minimum-time consensus , 2012, Autom..
[44] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[45] Dan Alistarh. A Brief Tutorial on Distributed and Concurrent Machine Learning , 2018, PODC.
[46] Yair Carmon,et al. Lower bounds for finding stationary points II: first-order methods , 2017, Mathematical Programming.
[47] Christopher De Sa,et al. Moniqua: Modulo Quantized Communication in Decentralized SGD , 2020, ICML.
[48] Jelena Diakonikolas,et al. Lower Bounds for Parallel and Randomized Convex Optimization , 2018, COLT.
[49] Robert B. Ross,et al. Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.
[50] Andreas Spanias,et al. A brief survey of machine learning methods and their sensor and IoT applications , 2017, 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA).
[51] Volkan Cevher,et al. An adaptive primal-dual framework for nonsmooth convex minimization , 2018, Mathematical Programming Computation.
[52] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[53] Lei Yuan,et al. $\texttt{DeepSqueeze}$: Decentralization Meets Error-Compensated Compression , 2019 .
[54] Laurent Massoulié,et al. Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.
[55] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[56] George Michailidis,et al. DAdam: A Consensus-Based Distributed Adaptive Gradient Method for Online Optimization , 2018, IEEE Transactions on Signal Processing.
[57] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[58] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[59] Kunle Olukotun,et al. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.
[60] B. Gerencsér. Markov chain mixing time on cycles , 2011 .
[61] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[62] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[63] Ohad Shamir,et al. The Complexity of Making the Gradient Small in Stochastic Convex Optimization , 2019, COLT.
[64] Martin Jaggi,et al. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.
[65] Mingyi Hong,et al. Distributed Non-Convex First-Order optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.
[66] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[67] Dan Alistarh,et al. Distributed Learning over Unreliable Networks , 2018, ICML.
[68] Nathan Srebro,et al. Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization , 2018, NeurIPS.
[69] Dan Alistarh,et al. Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent , 2020, ArXiv.
[70] A. Gasnikov,et al. Decentralized and Parallelized Primal and Dual Accelerated Methods for Stochastic Convex Programming Problems , 2019, 1904.09015.
[71] Xiaoxia Wu,et al. L ] 1 0 A pr 2 01 9 AdaGrad-Norm convergence over nonconvex landscapes AdaGrad stepsizes : sharp convergence over nonconvex landscapes , from any initialization , 2019 .
[72] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.