Mean-normalized stochastic gradient for large-scale deep learning
暂无分享,去创建一个
Hermann Ney | Ralf Schlüter | Simon Wiesler | Alexander Richard | H. Ney | Simon Wiesler | Alexander Richard | R. Schlüter
[1] H. Robbins,et al. A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .
[2] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[3] Yann LeCun,et al. Second Order Properties of Error Surfaces: Learning Time and Generalization , 1990, NIPS 1990.
[4] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[5] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.
[6] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.
[7] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[8] S. V. N. Vishwanathan,et al. Variable Metric Stochastic Approximation Theory , 2009, AISTATS.
[9] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[10] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[11] Hermann Ney,et al. The RWTH 2010 Quaero ASR evaluation system for English, French, and German , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Hermann Ney,et al. A Convergence Analysis of Log-Linear Training , 2011, NIPS.
[13] Tara N. Sainath,et al. Making Deep Belief Networks effective for large vocabulary continuous speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[14] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[15] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.
[16] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[17] Navdeep Jaitly,et al. Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition , 2012, INTERSPEECH.
[18] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[19] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.
[20] Georg Heigold,et al. An empirical study of learning rates in deep neural networks for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] Ebru Arisoy,et al. Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Yifan Gong,et al. Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.
[23] Jinyu Li,et al. Investigations on hessian-free optimization for cross-entropy training of deep neural networks , 2013, INTERSPEECH.
[24] Hermann Ney,et al. RASR/NN: The RWTH neural network toolkit for speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).