Predicting unseen triphones with senones

In large-vocabulary speech recognition, there are always new triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context-independent monophones. It is proposed that decision-tree-based senones be used to generate needed senonic baseforms for unseen triphones. A decision tree is built for each individual Markov state of each phone, and the leaves of the trees constitute the senone codebook. A Markov state of any triphone traverses the corresponding tree until it reaches a leaf to find the senone it is to be associated with. The DARPA 5000-word peaker-independent Wall Street Journal dictation task is used to evaluate the proposed method. The word error rate is reduced by more than 10% when unseen triphones are modeled by the decision-tree-based senones.<<ETX>>

[1]  Kai-Fu Lee,et al.  Interword coarticulation modeling for continuous speech recognition , 1989 .

[2]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[3]  George Zavaliagkos,et al.  Comparative Experiments on Large Vocabulary Speech Recognition , 1993, HLT.

[4]  Aaron E. Rosenberg,et al.  Improved Acoustic Modeling for Continuous Speech Recognition , 1990, HLT.

[5]  Vassilios Digalakis,et al.  Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Hsiao-Wuen Hon,et al.  Vocabulary-independent speech recognition: the Vocind System , 1992 .

[7]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[8]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[9]  P. Papantoni-Kazakos,et al.  Spectral distance measures between Gaussian processes , 1980, ICASSP.

[10]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[12]  X. D. Huang,et al.  Phoneme classification using semicontinuous hidden Markov models , 1992, IEEE Trans. Signal Process..

[13]  Jeff Shrager,et al.  Automatic Discovery of Contextual Factors Describing Phonological Variation , 1989, HLT.

[14]  Mei-Yuh Hwang,et al.  Shared-distribution hidden Markov models for speech recognition , 1993, IEEE Trans. Speech Audio Process..

[15]  Rodney W. Johnson,et al.  Axiomatic characterization of the directed divergences and their linear combinations , 1979, IEEE Trans. Inf. Theory.

[16]  K.-F. Lee,et al.  CMU robust vocabulary-independent speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[18]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[19]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[20]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  H. Jeffreys,et al.  Theory of probability , 1896 .

[22]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[23]  Mei-Yuh Hwang,et al.  An Overview of the SPHINX-II Speech Recognition System , 1993, HLT.

[24]  Mei-Yuh Hwang,et al.  Modeling between-word coarticulation in continuous speech recognition , 1989, EUROSPEECH.

[25]  Mei-Yuh Hwang,et al.  Subphonetic modeling with Markov states-Senone , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Norman Abramson,et al.  Information theory and coding , 1963 .