Factorial Hidden Markov Models for Speech Recognition: Preliminary Experiments

During the last decade the field of speech recognition has used the theory of hidden Markov models (HMMs) with great success. At the same time there is now a wide perception in the speech research community that new ideas are needed to continue improvements in performance. This report represents a small contribution in this effort. We explore an alternative acoustic modeling approach based on Factorial Hidden Markov Models (FHMMs). These are presented as possible extensions to HMMs. We show results for phonetic classification experiments using the phonetically balanced TIMIT database which compare the performance of FHMMs with HMMs and parallel HMMs. 1Beth Logan is a PhD student at the University of Cambridge, United Kingdom. This work was done during a summer internship. c Digital Equipment Corporation, 1997 This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of the Cambridge Research Laboratory of Digital Equipment Corporation in Cambridge, Massachusetts; an acknowledgment of the authors and individual contributors to the work; and all applicable portions of the copyright notice. Copying, reproducing, or republishing for any other purpose shall require a license with payment of fee to the Cambridge Research Laboratory. All rights reserved. CRL Technical reports are available on the CRL’s web page at http://www.crl.research.digital.com. Digital Equipment Corporation Cambridge Research Laboratory One Kendall Square, Building 700 Cambridge, Massachusetts 02139 USA

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3]  Misha Pavel,et al.  Towards ASR on partially corrupted speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Hervé Bourlard,et al.  A mew ASR approach based on independent processing and recombination of partial frequency bands , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Hervé Bourlard,et al.  Subband-based speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Matthew Brand,et al.  Coupled hidden Markov models for modeling interacting processes , 1997 .

[7]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[8]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[9]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.