Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks
暂无分享,去创建一个
Tara N. Sainath | Brian Kingsbury | Bhuvana Ramabhadran | Hagen Soltau | T. Sainath | Brian Kingsbury | H. Soltau | B. Ramabhadran
[1] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.
[2] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[3] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .
[4] Mark J. F. Gales,et al. Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..
[5] Michael I. Jordan,et al. Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..
[6] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[7] Kunle Olukotun,et al. Map-Reduce for Machine Learning on Multicore , 2006, NIPS.
[8] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[9] Nathaniel E. Helwig,et al. An Introduction to Linear Algebra , 2006 .
[10] Geoffrey Zweig,et al. Advances in speech transcription at IBM under the DARPA EARS program , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[11] M. Yuan,et al. Dimension reduction and coefficient estimation in multivariate linear regression , 2007 .
[12] Yoshua Bengio,et al. Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.
[13] Yoshua Bengio,et al. Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..
[14] Brian Kingsbury,et al. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[15] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.
[16] Gillian M. Chin,et al. On the Use of Stochastic Hessian Information in Unconstrained Optimization , 2010 .
[17] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[18] Lukás Burget,et al. Parallel training of neural networks for speech recognition , 2010, INTERSPEECH.
[19] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[20] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[21] Dong Yu,et al. Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition , 2010 .
[22] Brian Kingsbury,et al. The IBM Attila speech recognition toolkit , 2010, 2010 IEEE Spoken Language Technology Workshop.
[23] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[24] Jorge Nocedal,et al. On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..
[25] Dong Yu,et al. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[26] Tara N. Sainath,et al. Making Deep Belief Networks effective for large vocabulary continuous speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[27] Geoffrey E. Hinton,et al. Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Dong Yu,et al. Exploiting sparseness in deep neural networks for large vocabulary speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[30] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[31] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.
[32] Navdeep Jaitly,et al. Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition , 2012, INTERSPEECH.
[33] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.
[34] Tara N. Sainath,et al. Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization , 2012, INTERSPEECH.
[35] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[36] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.
[37] Christopher Ré,et al. Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.