Parallel training of DNNs with Natural Gradient and Parameter Averaging
暂无分享,去创建一个
[1] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[2] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .
[3] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[4] G. M. D. Corso. Estimating an Eigenvector by the Power Method with a Random Start , 1997 .
[5] Shun-ichi Amari,et al. Complexity Issues in Natural Gradient Descent Method for Training Multilayer Perceptrons , 1998, Neural Computation.
[6] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[7] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[8] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[9] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[10] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[11] Peter L. Bartlett,et al. Adaptive Online Gradient Descent , 2007, NIPS.
[12] Brian Kingsbury,et al. Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[13] Dong Yu,et al. Investigation of full-sequence training of deep belief networks for speech recognition , 2010, INTERSPEECH.
[14] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[15] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[16] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.
[17] Dong Yu,et al. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[18] Georg Heigold,et al. An empirical study of learning rates in deep neural networks for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[19] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[20] Michiel Bacchiani,et al. Rapid adaptation for mobile speech applications , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[22] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[23] Razvan Pascanu,et al. Natural Gradient Revisited , 2013, ICLR.
[24] Xiaohui Zhang,et al. Improving deep neural network acoustic models using generalized maxout networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Andrew W. Senior,et al. Improving DNN speaker independence with I-vector inputs , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).