Understanding the dropout strategy and analyzing its effectiveness on LVCSR

The work by Hinton et al shows that the dropout strategy can greatly improve the performance of neural networks as well as reducing the influence of over-fitting. Nevertheless, there is still not a more detailed study on this strategy. In addition, the effectiveness of dropout on the task of LVCSR has not been analyzed. In this paper, we attempt to make a further discussion on the dropout strategy. The impacts on performance of different dropout probabilities for phone recognition task are experimented on TIMIT. To get an in-depth understanding of dropout, experiments of dropout testing are designed from the perspective of model averaging. The effectiveness of dropout is analyzed on a LVCSR task. Results show that the method of dropout fine-tuning combined with standard back-propagation gives significant performance improvements.

[1]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[2]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[3]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[4]  Tara N. Sainath,et al.  Deep Belief Networks using discriminative features for phone recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[6]  Navdeep Jaitly,et al.  Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition , 2012, INTERSPEECH.

[7]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[9]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Rong Zheng,et al.  Asynchronous stochastic gradient descent for DNN training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.