暂无分享,去创建一个
Marc'Aurelio Ranzato | Ruoyu Sun | Sam Wiseman | Nicolas Vasilache | Sumit Chopra | Soumith Chintala | Arthur Szlam | S. Chopra | Marc'Aurelio Ranzato | Arthur D. Szlam | Soumith Chintala | Nicolas Vasilache | Ruoyu Sun | Sam Wiseman | M. Ranzato | Arthur Szlam
[1] M. Hestenes. Multiplier and gradient methods , 1969 .
[2] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.
[3] Yoshua Bengio,et al. Difference Target Propagation , 2014, ECML/PKDD.
[4] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[5] R. Courant. Variational methods for the solution of problems of equilibrium and vibrations , 1943 .
[6] Yann LeCun,et al. Modeles connexionnistes de l'apprentissage , 1987 .
[7] Yann LeCun,et al. Learning processes in an asymmetric threshold network , 1986 .
[8] B. Mercier,et al. A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .
[9] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[10] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[11] Ying Zhang,et al. On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.
[12] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[13] Anders Krogh,et al. A Cost Function for Internal Representations , 1989, NIPS.
[14] Yann LeCun,et al. A theoretical framework for back-propagation , 1988 .
[15] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[16] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[17] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[18] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[19] R. Glowinski,et al. Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .
[20] Yoshua Bengio,et al. How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation , 2014, ArXiv.
[21] Miguel Á. Carreira-Perpiñán,et al. Hashing with binary autoencoders , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[23] R. Fergus,et al. Learning invariant features through topographic filter maps , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[24] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[25] Miguel Á. Carreira-Perpiñán,et al. Distributed optimization of deeply nested systems , 2012, AISTATS.
[26] Yann LeCun,et al. Dynamic Factor Graphs for Time Series Modeling , 2009, ECML/PKDD.
[27] M. J. D. Powell,et al. A method for nonlinear constraints in minimization problems , 1969 .