Improved pre-training of Deep Belief Networks using Sparse Encoding Symmetric Machines

Restricted Boltzmann Machines (RBM) continue to be a popular methodology to pre-train weights of Deep Belief Networks (DBNs). However, the RBM objective function cannot be maximized directly. Therefore, it is not clear what function to monitor when deciding to stop the training, leading to a challenge in managing the computational costs. The Sparse Encoding Symmetric Machine (SESM) has been suggested as an alternative method for pre-training. By placing a sparseness term on the NN output codebook, SESM allows the objective function to be optimized directly and reliably be monitored as an indicator to stop the training. In this paper, we explore SESM to pre-train DBNs and apply this the first time to speech recognition. First, we provide a detailed analysis comparing the behavior of SESM and RBM. Second, we compare the performance of SESM pre-trained and RBM pre-trained DBNs on TIMIT and a 50 hour English Broadcast News task. Results indicate that pre-trained DBNs using SESM and RBMs achieve comparable performance and outperform randomly initialized DBNs with SESM providing a much easier stopping criterion relative to RBM.

[1]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[2]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[3]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[4]  Dong Yu,et al.  Investigation of full-sequence training of deep belief networks for speech recognition , 2010, INTERSPEECH.

[5]  Brian Kingsbury,et al.  The IBM Attila speech recognition toolkit , 2010, 2010 IEEE Spoken Language Technology Workshop.

[6]  Hermann Ney,et al.  The RWTH 2010 Quaero ASR evaluation system for English, French, and German , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[8]  Tara N. Sainath,et al.  Making Deep Belief Networks effective for large vocabulary continuous speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.