Recht-Ré Noncommutative Arithmetic-Geometric Mean Conjecture is False

Stochastic optimization algorithms have become indispensable in modern machine learning. An unresolved foundational question in this area is the difference between with-replacement sampling and without-replacement sampling -- does the latter have superior convergence rate compared to the former? A groundbreaking result of Recht and Re reduces the problem to a noncommutative analogue of the arithmetic-geometric mean inequality where $n$ positive numbers are replaced by $n$ positive definite matrices. If this inequality holds for all $n$, then without-replacement sampling indeed outperforms with-replacement sampling. The conjectured Recht-Re inequality has so far only been established for $n = 2$ and a special case of $n = 3$. We will show that the Recht-Re conjecture is false for general $n$. Our approach relies on the noncommutative Positivstellensatz, which allows us to reduce the conjectured inequality to a semidefinite program and the validity of the conjecture to certain bounds for the optimum values, which we show are false as soon as $n = 5$.

[1]  Positivstellensätze for noncommutative rational expressions , 2017, 1703.06951.

[2]  R. Bhatia,et al.  The matrix arithmetic-geometric mean inequality revisited , 2008 .

[3]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[4]  Christopher Ré,et al.  Toward a Noncommutative Arithmetic-geometric Mean Inequality: Conjectures, Case-studies, and Consequences , 2012, COLT.

[5]  Zhouchen Lin,et al.  Sharp Analysis for Nonconvex SGD Escaping from Saddle Points , 2019, COLT.

[6]  R. Bhatia,et al.  Noncommutative geometric means , 2006 .

[7]  John C. Duchi,et al.  Commentary on \Towards a Noncommutative Arithmetic-Geometric Mean Inequality" by B. Recht and C. R e , 2012 .

[8]  Prateek Jain,et al.  SGD without Replacement: Sharper Rates for General Smooth Convex Functions , 2019, ICML.

[9]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[10]  J. William Helton,et al.  The convex Positivstellensatz in a free algebra , 2011, 1102.4859.

[11]  J. Helton “Positive” noncommutative polynomials are sums of squares , 2002 .

[12]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[13]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[14]  L. Bottou Curiously Fast Convergence of some Stochastic Gradient Descent Algorithms , 2009 .

[15]  P. Parrilo,et al.  Symmetry groups, semidefinite programs, and sums of squares , 2002, math/0211450.

[16]  R. Vershynin,et al.  A Randomized Kaczmarz Algorithm with Exponential Convergence , 2007, math/0702226.

[17]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[18]  R. Bhatia,et al.  On the singular values of a product of operators , 1990 .

[19]  J. Lasserre A new Farkas lemma for positive semidefinite matrices , 1995, IEEE Trans. Autom. Control..

[21]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[22]  F. Krahmer,et al.  An arithmetic–geometric mean inequality for products of three matrices , 2014, 1411.0333.

[23]  J. Lasserre An Introduction to Polynomial and Semi-Algebraic Optimization , 2015 .

[24]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[25]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[26]  G. Pólya,et al.  Inequalities (Cambridge Mathematical Library) , 1934 .

[27]  Ohad Shamir,et al.  Without-Replacement Sampling for Stochastic Gradient Methods , 2016, NIPS.