Reparameterizing Mirror Descent as Gradient Descent
暂无分享,去创建一个
[1] Wojciech Kotlowski,et al. A case where a spindly two-layer linear network whips any neural network with a fully connected input layer , 2020, ArXiv.
[2] Manfred K. Warmuth,et al. Winnowing with Gradient Descent , 2020, COLT.
[3] Varun Kanade,et al. Implicit Regularization for Optimal Sparse Recovery , 2019, NeurIPS.
[4] Manfred K. Warmuth,et al. Robust Bi-Tempered Logistic Loss Based on Bregman Divergences , 2019, NeurIPS.
[5] Yoram Singer,et al. Exponentiated Gradient Meets Gradient Descent , 2019, 1902.01903.
[6] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[7] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[8] Sayan Mukherjee,et al. The Information Geometry of Mirror Descent , 2013, IEEE Transactions on Information Theory.
[9] Maxim Raginsky,et al. Continuous-time stochastic Mirror Descent on a network: Variance reduction, consensus, convergence , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[10] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[11] William H. Sandholm,et al. Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.
[12] Andrzej Cichocki,et al. Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.
[13] H. Zou,et al. Addendum: Regularization and variable selection via the elastic net , 2005 .
[14] Y. Mansour,et al. Improved second-order bounds for prediction with expert advice , 2005, Machine Learning.
[15] S. V. N. Vishwanathan,et al. Leaving the Span , 2005, COLT.
[16] J. Naudts. Deformed exponentials and logarithms in generalized thermostatistics , 2002, cond-mat/0203489.
[17] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[18] Manfred K. Warmuth,et al. The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..
[19] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..
[20] Manfred K. Warmuth,et al. The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.
[21] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.
[22] W. L. Burke. Applied Differential Geometry , 1985 .
[23] R. Rockafellar. Monotone Operators and the Proximal Point Algorithm , 1976 .
[24] Jiazhong Nie,et al. Online PCA with Optimal Regret , 2016, J. Mach. Learn. Res..
[25] Kathrin Abendroth,et al. The Geometry Of Population Genetics , 2016 .
[26] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[27] Manfred K. Warmuth,et al. The{dollar}p{dollar}-Norm Generalization of the LMS Algorithm for Adaptive Filtering , 2022 .