Annealed f-Smoothing as a Mechanism to Speed up Neural Network Training

In this paper, we describe a method to reduce the overall number of neural network training steps, during both crossentropy and sequence training stages. This is achieved through the interpolation of frame-level CE and sequence level SMBR criteria, during the sequence training stage. This interpolation is known as f-smoothing and has previously been just used to prevent overfitting during sequence training. However, in this paper, we investigate its application to reduce the training time. We explore different interpolation strategies to reduce the overall training steps; and achieve a reduction of up to 25% with almost no degradation in word error rate (WER). Finally, we explore the generalization of f-smoothing to other tasks.

[1]  Georg Heigold,et al.  Asynchronous stochastic optimization for sequence training of deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Tara N. Sainath,et al.  Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.

[3]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[4]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[5]  Tara N. Sainath,et al.  Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition , 2016, INTERSPEECH.

[6]  Yiming Wang,et al.  Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[7]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[8]  Tara N. Sainath,et al.  Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks , 2016, INTERSPEECH.

[9]  Li Deng,et al.  Deep Neural Network Sequence-Discriminative Training , 2015 .

[10]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[11]  Lutz Prechelt,et al.  Automatic early stopping using cross validation: quantifying the criteria , 1998, Neural Networks.

[12]  Dong Yu,et al.  Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[14]  Tara N. Sainath,et al.  Improvements to Deep Convolutional Neural Networks for LVCSR , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[15]  Tara N. Sainath,et al.  Lower Frame Rate Neural Network Acoustic Models , 2016, INTERSPEECH.

[16]  Brian Kingsbury,et al.  Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.