Decision tree state tying based on segmental clustering for acoustic modeling

A fast segmental clustering approach to decision tree tying based acoustic modeling is proposed for large vocabulary speech recognition. It is based on a two level clustering scheme for robust decision tree state clustering. This approach extends the conventional segmental K-means approach to phonetic decision tree state tying based acoustic modeling. It achieves high recognition performances while reducing the model training time from days to hours comparing to the approaches based on Baum-Welch training. Experimental results on standard Resource Management and Wall Street Journal tasks are presented which demonstrate the robustness and efficacy of this approach.

[1]  Aaron E. Rosenberg,et al.  Improved acoustic modeling for large vocabulary continuous speech recognition , 1992 .

[2]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[3]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[5]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[6]  Qiru Zhou,et al.  An approach to continuous speech recognition based on layered self-adjusting decoding graph , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Joseph P. Olive,et al.  Text-to-speech synthesis , 1995, AT&T Technical Journal.