Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training
暂无分享,去创建一个
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Venkatesh Saligrama,et al. Efficient Training of Very Deep Neural Networks for Supervised Hashing , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Yuan Yao,et al. A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training , 2018, ICLR.
[4] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[5] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[6] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[7] Ziming Zhang,et al. Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks , 2017, NIPS.
[8] Miguel Á. Carreira-Perpiñán,et al. ParMAC: distributed optimisation of nested functions, with application to learning binary autoencoders , 2016, MLSys.
[9] Weiran Wang,et al. Supplementary material for: Distributed Optimization of Deeply Nested Systems , 2014 .
[10] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.
[11] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[12] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[13] Jia Li,et al. Lifted Proximal Operator Machines , 2018, AAAI.
[14] Haesun Park,et al. Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework , 2014, J. Glob. Optim..
[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[16] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[17] Laurent El Ghaoui,et al. Lifted Neural Networks , 2018, ArXiv.
[18] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[19] Miguel Á. Carreira-Perpiñán,et al. Distributed optimization of deeply nested systems , 2012, AISTATS.