A new combined modeling of continuous speech recognition

Robust estimate of a large number of parameters against the availability of training data is a crucial issue in triphone based continuous speech recognition. To cope with the issue, two major context-clustering methods, agglomerative (AGG) and tree-based (TB), have been widely studied. In this paper, we analyze two algorithms with respect to their advantages and disadvantages and introduce a novel combined method that takes advantage of each method to cluster and tie similar acoustic states for highly detailed acoustic models. In addition, we devise a two-level clustering approach for TB, which uses the tree-based state tying for rare acoustic phonetic events twice. For LVCSR, the experimental results showed the performance could be highly improved by using the proposed combined method, compared with those of using the popular TB method alone.

[1]  Jen-Tzung Chien,et al.  Compact decision trees with cluster validity for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[3]  Brian Kan-Wing Mak,et al.  Direct training of subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[4]  Vassilios Digalakis,et al.  Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers , 1996, IEEE Trans. Speech Audio Process..

[5]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[6]  Mei-Yuh Hwang,et al.  Predicting unseen triphones with senones , 1996, IEEE Trans. Speech Audio Process..

[7]  Satoshi Nakamura,et al.  Weighted graph based decision tree optimization for high accuracy acoustic modeling , 2002, INTERSPEECH.

[8]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[9]  Wu Chou,et al.  Robust decision tree state tying for continuous speech recognition , 2000, IEEE Trans. Speech Audio Process..

[10]  Mei Hwang Subphonetic Acoustic Modeling for Speaker-Independent Continuous Speech Recognition , 2001 .

[11]  Steve J. Young,et al.  State clustering in hidden Markov model-based continuous speech recognition , 1994, Comput. Speech Lang..

[12]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Richard M. Stern,et al.  Distortion-class modeling for robust speech recognition under GSM RPE-LTP coding , 2000, Speech Commun..

[14]  Hanseok Ko,et al.  Construction of decision tree from data driven clustering , 2002, INTERSPEECH.