Factorial Hidden Restricted Boltzmann Machines for noise robust speech recognition

We present the Factorial Hidden Restricted Boltzmann Machine (FHRBM) for robust speech recognition. Speech and noise are modeled as independent RBMs, and the interaction between them is explicitly modeled to capture how speech and noise combine to generate observed noisy speech features. In contrast with RBMs, where the bottom layer of random variables is observed, inference in the FHRBM is intractable, scaling exponentially with the number of hidden units. We introduce variational algorithms for efficient approximate inference that scale linearly with the number of hidden units. Compared to traditional factorial models of noisy speech, which are based on GMMs, the FHRBM has the advantage that the representations of both speech and noise are highly distributed, allowing the model to learn a parts-based representation of noisy speech data that can generalize better to previously unseen noise compositions. Preliminary results suggest that the approach is promising.

[1]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Petr Fousek,et al.  Matched-condition robust Dynamic Noise Adaptation , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  Brendan J. Frey,et al.  ALGONQUIN - Learning Dynamic Noise Models From Noisy Speech for Robust Speech Recognition , 2001, NIPS.

[4]  Li Deng,et al.  Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise , 2004, IEEE Transactions on Speech and Audio Processing.

[5]  Geoffrey E. Hinton,et al.  Phone recognition using Restricted Boltzmann Machines , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[8]  Petr Fousek,et al.  Robust speech recognition using dynamic noise adaptation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).