Neural Network Training as an Optimal Control Problem : — An Augmented Lagrangian Approach —
暂无分享,去创建一个
Panagiotis Patrinos | Puya Latafat | Andreas Themelis | Johan Suykens | Brecht Evens | J. Suykens | Panagiotis Patrinos | Andreas Themelis | P. Latafat | B. Evens
[1] Laurent El Ghaoui,et al. Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training , 2018, AISTATS.
[2] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[3] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[4] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .
[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[6] Ziming Zhang,et al. Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks , 2017, NIPS.
[7] Yann Le Cun,et al. A Theoretical Framework for Back-Propagation , 1988 .
[8] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.
[9] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.
[10] Jong-Shi Pang,et al. MultiComposite Nonconvex Optimization for Training Deep Neural Networks , 2020, SIAM J. Optim..
[11] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[12] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[13] Miguel Á. Carreira-Perpiñán,et al. Distributed optimization of deeply nested systems , 2012, AISTATS.
[14] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[15] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[16] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .
[17] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[18] José Mario Martínez,et al. Practical augmented Lagrangian methods for constrained optimization , 2014, Fundamentals of algorithms.
[19] Ya-Xiang Yuan,et al. On the complexity of an augmented Lagrangian method for nonconvex optimization , 2019, IMA Journal of Numerical Analysis.
[20] YANQING CHEN,et al. Algorithm 8 xx : CHOLMOD , supernodal sparse Cholesky factorization and update / downdate ∗ , 2006 .
[21] Bastian Goldlücke,et al. Variational Analysis , 2014, Computer Vision, A Reference Guide.
[22] Venkatesh Saligrama,et al. Efficient Training of Very Deep Neural Networks for Supervised Hashing , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[25] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[26] Xiang Chen,et al. ADMM for Efficient Deep Learning with Global Convergence , 2019, KDD.