暂无分享,去创建一个
[1] Manfred K. Warmuth,et al. Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.
[2] Martin Jaggi,et al. Decoupling Backpropagation using Constrained Optimization Methods , 2018 .
[3] Frederik Kunstner,et al. Limitations of the empirical Fisher approximation for natural gradient descent , 2019, NeurIPS.
[4] Manfred K. Warmuth,et al. An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint , 2019, AAAI.
[5] AmariShun-Ichi. α-divergence is unique, belonging to both f-divergence and Bregman divergence classes , 2009 .
[6] Yoshua Bengio,et al. Difference Target Propagation , 2014, ECML/PKDD.
[7] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[8] Leon Hirsch,et al. Fundamentals Of Convex Analysis , 2016 .
[9] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[10] Yoram Singer,et al. Shampoo: Preconditioned Stochastic Tensor Optimization , 2018, ICML.
[11] Donald Goldfarb,et al. Practical Quasi-Newton Methods for Training Deep Neural Networks , 2020, NeurIPS.
[12] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[13] Manfred K. Warmuth,et al. Robust Bi-Tempered Logistic Loss Based on Bregman Divergences , 2019, NeurIPS.
[14] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .
[15] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[16] Ziming Zhang,et al. Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks , 2017, NIPS.
[17] Yuan Yao,et al. A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training , 2018, ICLR.
[18] Jia Li,et al. Lifted Proximal Operator Machines , 2018, AAAI.
[19] Yuan Yao,et al. Global Convergence of Block Coordinate Descent in Deep Learning , 2018, ICML.
[20] Roger B. Grosse,et al. Distributed Second-Order Optimization using Kronecker-Factored Approximations , 2016, ICLR.
[21] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[22] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[23] Christopher Zach,et al. Contrastive Learning for Lifted Networks , 2019, BMVC.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] Miguel Á. Carreira-Perpiñán,et al. ParMAC: distributed optimisation of nested functions, with application to learning binary autoencoders , 2016, MLSys.
[26] Tom Heskes,et al. On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.
[27] Satoshi Matsuoka,et al. Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs , 2018, ArXiv.
[28] Miguel Á. Carreira-Perpiñán,et al. Distributed optimization of deeply nested systems , 2012, AISTATS.
[29] Manfred K. Warmuth,et al. Reparameterizing Mirror Descent as Gradient Descent , 2020, NeurIPS.
[30] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[31] Laurent El Ghaoui,et al. Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training , 2018, AISTATS.
[32] Tie-Yan Liu,et al. On the Local Hessian in Back-propagation , 2018, NeurIPS.
[33] Yoram Singer,et al. Memory-Efficient Adaptive Optimization for Large-Scale Learning , 2019, ArXiv.
[34] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[35] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[36] Manfred K. Warmuth,et al. Relative loss bounds for single neurons , 1999, IEEE Trans. Neural Networks.
[37] Michael Möller,et al. Proximal Backpropagation , 2017, ICLR.
[38] Babak Hassibi,et al. The p-norm generalization of the LMS algorithm for adaptive filtering , 2003, IEEE Transactions on Signal Processing.
[39] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.
[40] Ali H. Sayed,et al. H∞ optimality of the LMS algorithm , 1996, IEEE Trans. Signal Process..