暂无分享,去创建一个
[1] Shenghuo Zhu,et al. Parallel Restarted SGD for Non-Convex Optimization with Faster Convergence and Less Communication , 2018, ArXiv.
[2] Taghi M. Khoshgoftaar,et al. Large-scale distributed L-BFGS , 2017, Journal of Big Data.
[3] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[4] Yann LeCun,et al. Deep learning with Elastic Averaging SGD , 2014, NIPS.
[5] Ioannis Mitliagkas,et al. Parallel SGD: When does averaging help? , 2016, ArXiv.
[6] Fabian Pedregosa,et al. Improved asynchronous parallel optimization analysis for stochastic incremental methods , 2018, J. Mach. Learn. Res..
[7] Jonathan D. Rosenblatt,et al. On the Optimality of Averaging in Distributed Statistical Learning , 2014, 1407.2724.
[8] Yuchen Zhang,et al. Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss , 2015, ArXiv.
[9] Seunghak Lee,et al. On Model Parallelization and Scheduling Strategies for Distributed Machine Learning , 2014, NIPS.
[10] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[11] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.
[12] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[13] Gideon S. Mann,et al. Distributed Training Strategies for the Structured Perceptron , 2010, NAACL.
[14] Dan Alistarh,et al. The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory , 2018, PODC.
[15] Wu-Jun Li,et al. Fast Asynchronous Parallel Stochastic Gradient Decent , 2015, ArXiv.
[16] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[17] David P. Woodruff,et al. Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.
[18] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[19] Patrice Marcotte,et al. Co-Coercivity and Its Role in the Convergence of Iterative Schemes for Solving Variational Inequalities , 1996, SIAM J. Optim..
[20] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[21] Avleen Singh Bijral,et al. Mini-Batch Primal and Dual Methods for SVMs , 2013, ICML.
[22] V. Fabian. On Asymptotic Normality in Stochastic Approximation , 1968 .
[23] Sofiane Saadane,et al. On the rates of convergence of parallelized averaged stochastic gradient algorithms , 2017, Statistics.
[24] Kunle Olukotun,et al. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.
[25] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[26] Laurent Massoulié,et al. Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.
[27] F. Bach,et al. NONPARAMETRIC STOCHASTIC APPROXIMATION WITH LARGE STEP-SIZES1 BY AYMERIC DIEULEVEUT , 2016 .
[28] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[29] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[30] Prateek Jain,et al. Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging , 2016, ArXiv.
[31] Hamid Reza Feyzmahdavian,et al. An asynchronous mini-batch algorithm for regularized stochastic optimization , 2015, CDC.
[32] Fabian Pedregosa,et al. Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization , 2017, NIPS.
[33] Dimitris S. Papailiopoulos,et al. Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..
[34] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[35] Martin J. Wainwright,et al. Optimality guarantees for distributed statistical estimation , 2014, 1405.0782.
[36] Ali Taylan Cemgil,et al. Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization , 2018, ICML.
[37] Michael I. Jordan,et al. Adding vs. Averaging in Distributed Primal-Dual Optimization , 2015, ICML.
[38] Chih-Jen Lin,et al. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems , 2015, ACM Trans. Intell. Syst. Technol..
[39] Todd Mytkowicz,et al. Parallel Stochastic Gradient Descent with Sound Combiners , 2017, ArXiv.
[40] Alexander J. Smola,et al. AIDE: Fast and Communication Efficient Distributed Optimization , 2016, ArXiv.
[41] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[42] Janis Keuper,et al. Asynchronous parallel stochastic gradient descent: a numeric core for scalable distributed machine learning algorithms , 2015, MLHPC@SC.
[43] Ohad Shamir,et al. Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.
[44] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[45] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.
[46] Alfred O. Hero,et al. Distributed principal component analysis on networks via directed graphical models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Wu-Jun Li,et al. Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee , 2016, AAAI.
[48] F. Bach,et al. Bridging the gap between constant step size stochastic gradient descent and Markov chains , 2017, The Annals of Statistics.
[49] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[50] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[51] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[52] S. Gadat,et al. Optimal non-asymptotic bound of the Ruppert-Polyak averaging without strong convexity , 2017, 1709.03342.
[53] Dan Alistarh,et al. The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning , 2016, ICML 2017.
[54] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[55] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[56] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..
[57] John C. Duchi,et al. Asynchronous stochastic convex optimization , 2015, 1508.00882.
[58] Gideon S. Mann,et al. Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models , 2009, NIPS.
[59] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, ArXiv.
[60] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[61] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, COLT.
[62] Martin Takác,et al. Partitioning Data on Features or Samples in Communication-Efficient Distributed Optimization? , 2015, ArXiv.
[63] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[64] Dan Alistarh,et al. QSGD: Randomized Quantization for Communication-Optimal Stochastic Gradient Descent , 2016, ArXiv.
[65] Martin J. Wainwright,et al. Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.
[66] Tom Goldstein,et al. Efficient Distributed SGD with Variance Reduction , 2015, 2016 IEEE 16th International Conference on Data Mining (ICDM).
[67] Sarit Khirirat,et al. Distributed learning with compressed gradients , 2018, 1806.06573.
[68] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[69] Saibal Mukhopadhyay,et al. On-chip training of recurrent neural networks with limited numerical precision , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[70] Blaise Agüera y Arcas,et al. Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.
[71] Xiaoqian Jiang,et al. Fast and Robust Parallel SGD Matrix Factorization , 2015, KDD.
[72] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[73] John Langford,et al. Slow Learners are Fast , 2009, NIPS.
[74] Ohad Shamir,et al. Communication Complexity of Distributed Convex Learning and Optimization , 2015, NIPS.
[75] Tianbao Yang,et al. Distributed Stochastic Variance Reduced Gradient Methods and A Lower Bound for Communication Complexity , 2015 .
[76] Diego Klabjan,et al. A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations , 2018, ArXiv.
[77] Tong Zhang,et al. Stochastic Optimization with Importance Sampling for Regularized Loss Minimization , 2014, ICML.
[78] Zhihua Zhang,et al. Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features , 2017, AAAI.
[79] Michael I. Jordan,et al. Distributed optimization with arbitrary local solvers , 2015, Optim. Methods Softw..
[80] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[81] Michael I. Jordan,et al. CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..
[82] Alexander J. Smola,et al. On Variance Reduction in Stochastic Gradient Descent and its Asynchronous Variants , 2015, NIPS.
[83] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[84] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[85] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[86] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[87] Francis R. Bach,et al. Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions , 2015, AISTATS.
[88] Martin J. Wainwright,et al. Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[89] Thomas Paine,et al. GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training , 2013, ICLR.
[90] Bin Wu,et al. A Fast Distributed Stochastic Gradient Descent Algorithm for Matrix Factorization , 2014, BigMine.
[91] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[92] Chih-Jen Lin,et al. A fast parallel SGD for matrix factorization in shared memory systems , 2013, RecSys.
[93] Jakub Konecný,et al. Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.
[94] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[95] Yurii Nesterov,et al. Confidence level solutions for stochastic programming , 2000, Autom..
[96] Ohad Shamir,et al. Communication-Efficient Distributed Optimization using an Approximate Newton-type Method , 2013, ICML.
[97] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.