论文信息 - Permutation-Based SGD: Is Random Optimal?

Permutation-Based SGD: Is Random Optimal?

A recent line of ground-breaking results for permutation-based SGD has corroborated a widely observed phenomenon: random permutations offer faster convergence than with-replacement sampling. However, is random optimal? We show that this depends heavily on what functions we are optimizing, and the convergence gap between optimal and random permutations can vary from exponential to nonexistent. We first show that for 1-dimensional strongly convex functions, with smooth second derivatives, there exist optimal permutations that offer exponentially faster convergence compared to random. However, for general strongly convex functions, random permutations are optimal. Finally, we show that for quadratic, strongly-convex functions, there are easy-to-construct permutations that lead to accelerated convergence compared to random. Our results suggest that a general convergence characterization of optimal permutations cannot capture the nuances of individual function classes, and can mistakenly indicate that one cannot do much better than random.

Dimitris Papailiopoulos | Shashank Rajput | Kangwook Lee

[1] Ohad Shamir,et al. How Good is SGD with Random Shuffling? , 2019, COLT.

[2] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..

[3] B. Recht,et al. Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences , 2012, 1202.4184.

[4] Konstantin Mishchenko,et al. Random Reshuffling: Simple Analysis with Vast Improvements , 2020, NeurIPS.

[5] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6] Asuman E. Ozdaglar,et al. Why random reshuffling beats stochastic gradient descent , 2015, Mathematical Programming.

[7] Lam M. Nguyen,et al. Shuffling Gradient-Based Methods with Momentum , 2020, ArXiv.

[8] Ohad Shamir,et al. Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.

[9] Prateek Jain,et al. SGD without Replacement: Sharper Rates for General Smooth Convex Functions , 2019, ICML.

[10] Suvrit Sra,et al. Random Shuffling Beats SGD after Finite Epochs , 2018, ICML.

[11] Lek-Heng Lim,et al. Recht-Ré Noncommutative Arithmetic-Geometric Mean Conjecture is False , 2020, ICML.