论文信息 - Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent

Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent

We present a primal only derivation of Mirror Descent as a “partial” discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential. We contrast this discretization to Natural Gradient Descent, which is obtained by a “full” forward Euler discretization. This view helps shed light on the relationship between the methods and allows generalizing Mirror Descent to general Riemannian geometries, even when the metric tensor is not a Hessian, and thus there is no “dual.”

Nathan Srebro | Suriya Gunasekar | Blake E. Woodworth

[1] R. Langevin. Differential Geometry of Curves and Surfaces , 2001 .

[2] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[3] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[4] Ambuj Tewari,et al. On the Universality of Online Mirror Descent , 2011, NIPS.

[5] Levent Tunçel,et al. Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[6] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[7] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[8] Ruslan Salakhutdinov,et al. Geometry of Optimization and Implicit Regularization in Deep Learning , 2017, ArXiv.

[9] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.

[10] Sayan Mukherjee,et al. The Information Geometry of Mirror Descent , 2013, IEEE Transactions on Information Theory.

[11] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.

[12] Maxim Raginsky,et al. Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[13] Matus Telgarsky,et al. Margins, Shrinkage, and Boosting , 2013, ICML.

[14] Nathan Srebro,et al. Kernel and Deep Regimes in Overparametrized Models , 2019, ArXiv.

[15] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[16] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[17] Eric Moulines,et al. Unifying mirror descent and dual averaging , 2019, ArXiv.

[18] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .