Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent

We present a primal only derivation of Mirror Descent as a “partial” discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential. We contrast this discretization to Natural Gradient Descent, which is obtained by a “full” forward Euler discretization. This view helps shed light on the relationship between the methods and allows generalizing Mirror Descent to general Riemannian geometries, even when the metric tensor is not a Hessian, and thus there is no “dual.”

[1]  R. Langevin Differential Geometry of Curves and Surfaces , 2001 .

[2]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[3]  Nathan Srebro,et al.  Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[4]  Ambuj Tewari,et al.  On the Universality of Online Mirror Descent , 2011, NIPS.

[5]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[6]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[7]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[8]  Ruslan Salakhutdinov,et al.  Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.

[9]  Nathan Srebro,et al.  Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.

[10]  Sayan Mukherjee,et al.  The Information Geometry of Mirror Descent , 2013, IEEE Transactions on Information Theory.

[11]  Kaifeng Lyu,et al.  Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.

[12]  Maxim Raginsky,et al.  Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[13]  Matus Telgarsky,et al.  Margins, Shrinkage, and Boosting , 2013, ICML.

[14]  Nathan Srebro,et al.  Kernel and Deep Regimes in Overparametrized Models , 2019, ArXiv.

[15]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[16]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[17]  Eric Moulines,et al.  Unifying mirror descent and dual averaging , 2019, ArXiv.

[18]  Shun-ichi Amari,et al.  Differential-geometrical methods in statistics , 1985 .