Latent perceptual mapping: a new acoustic modeling framework for speech recognition

While hidden Markov modeling is still the dominant paradigm for speech recognition, in recent years there has been renewed interest in alternative, template-like approaches to acoustic modeling. Such methods sidestep usual HMM limitations as well as inherent issues with parametric statistical distributions, though typically at the expense of large amounts of memory and computing power. This paper introduces a new framework, dubbed latent perceptual mapping, which naturally leverages a reduced dimensionality description of the observations. This allows for a viable parsimonious template-like solution where models are closely aligned with perceived acoustic events. Context-independent phoneme classification experiments conducted on the TIMIT database suggest that latent perceptual mapping achieves results comparable to conventional acoustic modeling but at potentially significant savings in online costs.

[1]  Geoffrey Zweig,et al.  A flat direct model for speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3]  Patrick Wambacq,et al.  Template-Based Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Douglas D. O'Shaughnessy,et al.  Context-independent phoneme recognition using a K-Nearest Neighbour classification approach , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Aren Jansen,et al.  Detection-based speech recognition with sparse point process models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  James R. Glass,et al.  Heterogeneous acoustic measurements for phonetic classification 1 , 1997, EUROSPEECH.

[7]  Shrikanth S. Narayanan,et al.  Audio retrieval by latent perceptual indexing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[9]  Aaron E. Rosenberg,et al.  Speaker independent recognition of isolated words using clustering techniques , 1979, ICASSP.

[10]  Jerome R. Bellegarda,et al.  Latent Semantic Mapping: Principles And Applications (Synthesis Lectures on Speech and Audio Processing) , 2006 .

[11]  Geoffrey Zweig,et al.  From flat direct models to segmental CRF models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  James R. Glass,et al.  HETEROGENEOUS ACOUSTIC MEASUREMENTS FOR PHONETIC CLASSIFICATION , 1997 .

[13]  Jerome Rene Bellegarda,et al.  Latent Semantic Mapping , 2007 .

[14]  Geoffrey E. Hinton,et al.  Phone recognition using Restricted Boltzmann Machines , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.