Beyond Backprop: Online Alternating Minimization with Auxiliary Variables
暂无分享,去创建一个
Brian Kingsbury | Viatcheslav Gurev | Djallel Bouneffouf | Ronny Luss | Benjamin Cowen | Anna Choromanska | Irina Rish | Ravi Tejwani | Mattia Rigotti | Sadhana Kumaravel | Paolo Diachille | Brian Kingsbury | A. Choromańska | I. Rish | Djallel Bouneffouf | Mattia Rigotti | V. Gurev | Sadhana Kumaravel | Ronny Luss | Benjamin Cowen | Paolo Diachille | Ravi Tejwani
[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[2] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[3] Mark W. Schmidt,et al. Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.
[4] Sebastian Thrun,et al. Lifelong Learning Algorithms , 1998, Learning to Learn.
[5] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[6] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[7] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.
[8] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[9] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[10] Martin Jaggi,et al. Decoupling Backpropagation using Constrained Optimization Methods , 2018 .
[11] P. Baldi,et al. Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.
[12] Thomas Villmann,et al. Applications of lp-Norms and their Smooth Approximations for Gradient Based Learning Vector Quantization , 2014, ESANN.
[13] Yoshua Bengio,et al. Dendritic cortical microcircuits approximate the backpropagation algorithm , 2018, NeurIPS.
[14] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[15] Geoffrey E. Hinton,et al. Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures , 2018, NeurIPS.
[16] H. Robbins. A Stochastic Approximation Method , 1951 .
[17] Tim Tsz-Kit Lau,et al. Global Convergence in Deep Learning with Variable Splitting via the Kurdyka-{\L}ojasiewicz Property , 2018 .
[18] Miguel Á. Carreira-Perpiñán,et al. Distributed optimization of deeply nested systems , 2012, AISTATS.
[19] Timothy P Lillicrap,et al. Towards deep learning with segregated dendrites , 2016, eLife.
[20] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[21] Yuan Yao,et al. Block Coordinate Descent for Deep Learning: Unified Convergence Guarantees , 2018, ArXiv.
[22] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[23] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[24] Venkatesh Saligrama,et al. Efficient Training of Very Deep Neural Networks for Supervised Hashing , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Yoshua Bengio,et al. Difference Target Propagation , 2014, ECML/PKDD.
[26] Yuan Yao,et al. A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training , 2018, ICLR.
[27] Geoffrey E. Hinton,et al. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.
[28] Sebastian Thrun,et al. A lifelong learning perspective for mobile robot control , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).
[29] Yann LeCun,et al. Modeles connexionnistes de l'apprentissage , 1987 .
[30] Yann LeCun,et al. Learning processes in an asymmetric threshold network , 1986 .
[31] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[32] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[33] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[34] Y. L. Cun. Learning Process in an Asymmetric Threshold Network , 1986 .
[35] James C. R. Whittington,et al. Theories of Error Back-Propagation in the Brain , 2019, Trends in Cognitive Sciences.
[36] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.
[37] Martin J. Wainwright,et al. Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.
[38] W. Bastiaan Kleijn,et al. Training Deep Neural Networks via Optimization Over Graphs , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Guillermo Sapiro,et al. Online dictionary learning for sparse coding , 2009, ICML '09.
[40] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[41] Ziming Zhang,et al. Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks , 2017, NIPS.
[42] Michael Möller,et al. Proximal Backpropagation , 2017, ICLR.
[43] John J. Hopfield,et al. Unsupervised learning by competing hidden units , 2018, Proceedings of the National Academy of Sciences.
[44] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[45] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.
[46] Yann Le Cun,et al. A Theoretical Framework for Back-Propagation , 1988 .