Stochastic data sweeping for fast DNN training

Context-dependent deep neural network (CD-DNN) has been successfully used in large vocabulary continuous speech recognition (LVCSR). However the immense computational cost of the mini-batch based back-propagation (BP) training has become a major block to utilize massive speech data for DNN training. Previous works on BP training acceleration mainly focus on parallelization with multiple GPUs. In this paper, a novel stochastic data sweeping (SDS) framework is proposed from a different perspective to speed up DNN training with a single GPU. Part of the training data is randomly selected from the whole set and the quantity is gradually reduced at each training epoch. SDS utilizes less data in the entire process and consequently save tremendous training time. Since SDS works at data level, it is complementary to parallel training strategies and can be integrated to form a much faster training framework. Experiments showed that, combining SDS with asynchronous stochastic gradient descent (ASGD) can achieve almost 3.0 times speed-up on 2 GPUs at no loss of recognition accuracy.

[1]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[3]  Li-Rong Dai,et al.  A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[5]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[6]  Steve Young,et al.  Large vocabulary speech recognition , 1995 .

[7]  Dong Yu,et al.  Pipelined Back-Propagation for Context-Dependent Deep Neural Networks , 2012, INTERSPEECH.

[8]  Dong Yu,et al.  Exploiting sparseness in deep neural networks for large vocabulary speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[10]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Rong Zheng,et al.  Asynchronous stochastic gradient descent for DNN training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Dong Yu,et al.  Pipelined BackPropagation for Context-Dependent Deep Neural Networks , 2012 .