RATQ: A Universal Fixed-Length Quantizer for Stochastic Optimization
暂无分享,去创建一个
[1] Himanshu Tyagi,et al. Inference Under Information Constraints II: Communication Constraints and Shared Randomness , 2019, IEEE Transactions on Information Theory.
[2] S. Agaian. Hadamard Matrices and Their Applications , 1985 .
[3] Ilya Dumer. Covering Spheres with Spheres , 2007, Discret. Comput. Geom..
[4] Jon Hamkins,et al. Design and analysis of spherical codes , 1996 .
[5] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[6] David P. Woodruff,et al. Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.
[7] Xiaowei Hu,et al. (Bandit) Convex Optimization with Biased Noisy Gradient Oracles , 2015, AISTATS.
[8] Jacob Ziv,et al. On universal quantization , 1985, IEEE Trans. Inf. Theory.
[9] Jon Hamkins,et al. Asymptotically dense spherical codes - Part II: Laminated spherical codes , 1997, IEEE Trans. Inf. Theory.
[10] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[11] Martin J. Wainwright,et al. Information-theoretic lower bounds on the oracle complexity of convex optimization , 2009, NIPS.
[12] Daniel M. Roy,et al. NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization , 2019, ArXiv.
[13] Raj Kumar Maity,et al. vqSGD: Vector Quantized Stochastic Gradient Descent , 2019, IEEE Transactions on Information Theory.
[14] Himanshu Tyagi,et al. Inference Under Information Constraints I: Lower Bounds From Chi-Square Contraction , 2018, IEEE Transactions on Information Theory.
[15] Jon Hamkins,et al. Gaussian source coding with spherical codes , 2002, IEEE Trans. Inf. Theory.
[16] H. Robbins. A Stochastic Approximation Method , 1951 .
[17] Jon Hamkins,et al. Asymptotically dense spherical codes - Part h Wrapped spherical codes , 1997, IEEE Trans. Inf. Theory.
[18] Nathan Srebro,et al. Open Problem: The Oracle Complexity of Convex Optimization with Limited Memory , 2019, COLT.
[19] Martin J. Wainwright,et al. Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.
[20] Christopher De Sa,et al. Distributed Learning with Sublinear Communication , 2019, ICML.
[21] Kunle Olukotun,et al. Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.
[22] Meir Feder,et al. Low-Density Lattice Codes , 2007, IEEE Transactions on Information Theory.
[23] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[24] A. Wyner. Random packings and coverings of the unit n-sphere , 1967 .
[25] Yanjun Han,et al. Geometric Lower Bounds for Distributed Parameter Estimation Under Communication Constraints , 2018, IEEE Transactions on Information Theory.
[26] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[27] Roger D. Hersch,et al. Rotated dispersed dither: a new technique for digital halftoning , 1994, SIGGRAPH.
[28] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.
[29] Martin J. Wainwright,et al. Optimality guarantees for distributed statistical estimation , 2014, 1405.0782.
[30] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[31] A. Lapidoth. On the role of mismatch in rate distortion theory , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.
[32] John C. Duchi. Introductory lectures on stochastic optimization , 2018, IAS/Park City Mathematics Series.
[33] Suhas Diggavi,et al. Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations , 2019, IEEE Journal on Selected Areas in Information Theory.
[34] Allen Gersho,et al. Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.
[35] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[36] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[37] R. Gallager. Information Theory and Reliable Communication , 1968 .
[38] Sanjiv Kumar,et al. cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.
[39] Roman Vershynin,et al. Uncertainty Principles and Vector Quantization , 2006, IEEE Transactions on Information Theory.
[40] Uri Erez,et al. Dithered Quantization via Orthogonal Transformations , 2016, IEEE Transactions on Signal Processing.
[41] Ohad Shamir,et al. Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation , 2013, NIPS.
[42] Yanfei Yan,et al. Polar lattices: Where Arıkan meets Forney , 2013, 2013 IEEE International Symposium on Information Theory.
[43] Martin J. Wainwright,et al. Low density codes achieve the rate-distortion bound , 2006, Data Compression Conference (DCC'06).
[44] Gábor Lugosi,et al. Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.
[45] Himanshu Tyagi,et al. Extra Samples can Reduce the Communication for Independence Testing , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).
[46] Maxim Raginsky,et al. Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation , 2016, IEEE Transactions on Information Theory.
[47] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[48] Bernard Chazelle,et al. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.
[49] Prakash Narayan,et al. Gaussian arbitrarily varying channels , 1987, IEEE Trans. Inf. Theory.
[50] Imre Csiszár,et al. Capacity of the Gaussian arbitrarily varying channel , 1991, IEEE Trans. Inf. Theory.
[51] Kenneth Rose,et al. On Constrained Randomized Quantization , 2012, IEEE Transactions on Signal Processing.
[52] A. Lapidoth. On the role of mismatch in rate distortion theory , 1997, IEEE Trans. Inf. Theory.
[53] Tengyu Ma,et al. On Communication Cost of Distributed Statistical Estimation and Dimensionality , 2014, NIPS.
[54] Ananda Theertha Suresh,et al. Distributed Mean Estimation with Limited Communication , 2016, ICML.
[55] Martin Jaggi,et al. Error Feedback Fixes SignSGD and other Gradient Compression Schemes , 2019, ICML.
[56] Martin J. Wainwright,et al. Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.