Manifold Identification for Ultimately Communication-Efficient Distributed Optimization
暂无分享,去创建一个
[1] Stephen J. Wright,et al. Inexact Successive quadratic approximation for regularized optimization , 2018, Comput. Optim. Appl..
[2] Qing Ling,et al. EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.
[3] Pradeep Ravikumar,et al. Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators , 2014, NIPS.
[4] Mark W. Schmidt,et al. Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence , 2017 .
[5] Mohamed-Jalal Fadili,et al. Local Convergence Properties of Douglas–Rachford and Alternating Direction Method of Multipliers , 2017, Journal of Optimization Theory and Applications.
[6] Jorge Nocedal,et al. Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..
[7] Adrian S. Lewis,et al. Identifying Active Manifolds , 2007, Algorithmic Oper. Res..
[8] Thomas Hofmann,et al. A Distributed Second-Order Algorithm You Can Trust , 2018, ICML.
[9] Mohamed-Jalal Fadili,et al. Activity Identification and Local Linear Convergence of Forward-Backward-type Methods , 2015, SIAM J. Optim..
[10] W. L. Hare,et al. Identifying Active Manifolds in Regularization Problems , 2011, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.
[11] R. Mifflin. Semismooth and Semiconvex Functions in Constrained Optimization , 1977 .
[12] Georgios B. Giannakis,et al. LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning , 2018, NeurIPS.
[13] Chih-Jen Lin,et al. Preconditioned Conjugate Gradient Methods in Truncated Newton Frameworks for Large-scale Linear Classification , 2018, ACML.
[14] Stephen J. Wright. Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..
[15] Stephen J. Wright,et al. A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization , 2019, ArXiv.
[16] Jingwei Liang,et al. Local Convergence Properties of SAGA/Prox-SVRG and Acceleration , 2018, ICML.
[17] Stephen J. Wright,et al. Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.
[18] P. Bühlmann,et al. The group lasso for logistic regression , 2008 .
[19] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[20] A. Lewis,et al. Identifying active constraints via partial smoothness and prox-regularity , 2003 .
[21] Mark W. Schmidt,et al. “Active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern? , 2017, Optim. Lett..
[22] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[23] Jianfeng Gao,et al. Scalable training of L1-regularized log-linear models , 2007, ICML '07.
[24] Kai-Wei Chang,et al. Distributed block-diagonal approximation methods for regularized empirical risk minimization , 2017, Machine Learning.
[25] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[26] Adrian S. Lewis,et al. Active Sets, Nonsmoothness, and Sensitivity , 2002, SIAM J. Optim..
[27] Mark W. Schmidt,et al. Are we there yet? Manifold identification of gradient-related proximal methods , 2019, AISTATS.
[28] J. Hiriart-Urruty,et al. Generalized Hessian matrix and second-order optimality conditions for problems withC1,1 data , 1984 .
[29] Liqun Qi,et al. A nonsmooth version of Newton's method , 1993, Math. Program..
[30] Bastian Goldlücke,et al. Variational Analysis , 2014, Computer Vision, A Reference Guide.
[31] Tyler B. Johnson,et al. Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization , 2015, ICML.
[32] Stephen J. Wright,et al. Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning , 2012, J. Mach. Learn. Res..
[33] Masao Fukushima,et al. On the Global Convergence of the BFGS Method for Nonconvex Unconstrained Optimization Problems , 2000, SIAM J. Optim..
[34] Chia-Hua Ho,et al. An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..
[35] Yuchen Zhang,et al. DiSCO: Distributed Optimization for Self-Concordant Empirical Loss , 2015, ICML.
[36] Chih-Jen Lin,et al. Limited-memory Common-directions Method for Distributed Optimization and its Application on Empirical Risk Minimization , 2017, SDM.
[37] Mohamed-Jalal Fadili,et al. Sensitivity Analysis for Mirror-Stratifiable Convex Functions , 2017, SIAM J. Optim..
[38] Tong Zhang,et al. A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization , 2016, J. Mach. Learn. Res..
[39] S. Sundararajan,et al. A distributed block coordinate descent method for training $l_1$ regularized linear classifiers , 2014, J. Mach. Learn. Res..
[40] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[41] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[42] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[43] Adrian S. Lewis,et al. Partial Smoothness, Tilt Stability, and Generalized Hessians , 2013, SIAM J. Optim..
[44] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..
[45] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.