论文信息 - Predicting unseen triphones with senones

Predicting unseen triphones with senones

In large-vocabulary speech recognition, there are always new triphones that are not covered in the training data. These unseen triphones are usually represented by corresponding diphones or context-independent monophones. It is proposed that decision-tree-based senones be used to generate needed senonic baseforms for unseen triphones. A decision tree is built for each individual Markov state of each phone, and the leaves of the trees constitute the senone codebook. A Markov state of any triphone traverses the corresponding tree until it reaches a leaf to find the senone it is to be associated with. The DARPA 5000-word peaker-independent Wall Street Journal dictation task is used to evaluate the proposed method. The word error rate is reduced by more than 10% when unseen triphones are modeled by the decision-tree-based senones.<<ETX>>

[1] Kai-Fu Lee,et al. Interword coarticulation modeling for continuous speech recognition , 1989 .

[2] L. R. Rabiner,et al. A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[3] George Zavaliagkos,et al. Comparative Experiments on Large Vocabulary Speech Recognition , 1993, HLT.

[4] Aaron E. Rosenberg,et al. Improved Acoustic Modeling for Continuous Speech Recognition , 1990, HLT.

[5] Vassilios Digalakis,et al. Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Hsiao-Wuen Hon,et al. Vocabulary-independent speech recognition: the Vocind System , 1992 .

[7] Biing-Hwang Juang,et al. Hidden Markov Models for Speech Recognition , 1991 .

[8] Solomon Kullback,et al. Information Theory and Statistics , 1960 .

[9] P. Papantoni-Kazakos,et al. Spectral distance measures between Gaussian processes , 1980, ICASSP.

[10] Michael Picheny,et al. Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[11] Mei-Yuh Hwang,et al. The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[12] X. D. Huang,et al. Phoneme classification using semicontinuous hidden Markov models , 1992, IEEE Trans. Signal Process..

[13] Jeff Shrager,et al. Automatic Discovery of Contextual Factors Describing Phonological Variation , 1989, HLT.

[14] Mei-Yuh Hwang,et al. Shared-distribution hidden Markov models for speech recognition , 1993, IEEE Trans. Speech Audio Process..

[15] Rodney W. Johnson,et al. Axiomatic characterization of the directed divergences and their linear combinations , 1979, IEEE Trans. Inf. Theory.

[16] K.-F. Lee,et al. CMU robust vocabulary-independent speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17] Van Nostrand,et al. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[18] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .

[19] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[20] John Makhoul,et al. Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21] H. Jeffreys,et al. Theory of probability , 1896 .

[22] T. Kailath. The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[23] Mei-Yuh Hwang,et al. An Overview of the SPHINX-II Speech Recognition System , 1993, HLT.

[24] Mei-Yuh Hwang,et al. Modeling between-word coarticulation in continuous speech recognition , 1989, EUROSPEECH.

[25] Mei-Yuh Hwang,et al. Subphonetic modeling with Markov states-Senone , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26] Norman Abramson,et al. Information theory and coding , 1963 .