暂无分享,去创建一个
Christopher De Sa | Karthik Sridharan | Dylan J. Foster | Jayadev Acharya | Karthik Sridharan | Jayadev Acharya
[1] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Christopher Ré,et al. Asynchronous stochastic convex optimization: the noise is in the noise and SGD don't care , 2015, NIPS.
[3] Tengyu Ma,et al. On Communication Cost of Distributed Statistical Estimation and Dimensionality , 2014, NIPS.
[4] G. Pisier. Remarques sur un résultat non publié de B. Maurey , 1981 .
[5] Himanshu Tyagi,et al. Extra Samples can Reduce the Communication for Independence Testing , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).
[6] Po-Ling Loh,et al. Support recovery without incoherence: A case for nonconvex regularization , 2014, ArXiv.
[7] L. Jones. A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .
[8] Arkadi Nemirovski,et al. Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.
[9] Tong Zhang,et al. Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints , 2010, SIAM J. Optim..
[10] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[11] Martin J. Wainwright,et al. Optimality guarantees for distributed statistical estimation , 2014, 1405.0782.
[12] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[13] Ambuj Tewari,et al. Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..
[14] Himanshu Tyagi,et al. Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction , 2018, IEEE Transactions on Information Theory.
[15] Ambuj Tewari,et al. Smoothness, Low Noise and Fast Rates , 2010, NIPS.
[16] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[17] John Langford,et al. Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.
[18] K. Ball,et al. Sharp uniform convexity and smoothness inequalities for trace norms , 1994 .
[19] Ohad Shamir,et al. Space lower bounds for linear prediction , 2019, ArXiv.
[20] Dan Alistarh,et al. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning , 2017, ICML.
[21] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.
[22] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.
[23] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[24] Ohad Shamir,et al. Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation , 2013, NIPS.
[25] Alexandre B. Tsybakov,et al. Optimal Rates of Aggregation , 2003, COLT.
[26] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[27] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[28] David P. Woodruff,et al. Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.
[29] Martin J. Wainwright,et al. A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.
[30] Gregory Valiant,et al. Memory, Communication, and Statistical Queries , 2016, COLT.
[31] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[32] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..
[33] A. Juditsky,et al. Deterministic and Stochastic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization , 2014 .
[34] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[35] Shai Shalev-Shwartz,et al. Near-Optimal Algorithms for Online Matrix Prediction , 2012, COLT.
[36] O. Shamir,et al. L G ] 6 J un 2 01 8 Detecting Correlations with Little Memory and Communication , 2018 .
[37] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[38] Daniel M. Kane,et al. A Derandomized Sparse Johnson-Lindenstrauss Transform , 2010, Electron. Colloquium Comput. Complex..
[39] Ran Raz. Fast Learning Requires Good Memory : A Time-Space Lower Bound for Parity Learning , 2018 .
[40] G. Pisier. Martingales in Banach Spaces , 2016 .
[41] Himanshu Tyagi,et al. Distributed Simulation and Distributed Inference , 2018, Electron. Colloquium Comput. Complex..
[42] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[43] John C. Duchi,et al. Minimax rates for memory-bounded sparse linear regression , 2015, COLT.
[44] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[45] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[46] Yanjun Han,et al. Geometric Lower Bounds for Distributed Parameter Estimation Under Communication Constraints , 2018, IEEE Transactions on Information Theory.
[47] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[48] Ohad Shamir,et al. Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..
[49] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.