A continuous speaker-independent putonghua dictation system

We describe new methods for continuous putonghua speech recognition. We have augmented the IBM HMM-based continuous speech recognition system with the following features. First, we treat tones in putonghua as attributes of certain phonemes, instead of syllables. We call those phonemes with tone tonemes. Second, instantaneous pitch is treated as a variable in the acoustic feature vector, in the same way as cepstra or energy. Third, by designing a set of word-segmentation rules to convert the continuous Chinese text into segmented text, the trigram language model works effectively. By applying those new methods, a speaker-independent, very-large-vocabulary continuous putonghua dictation system can be constructed.

[1]  Michael Picheny,et al.  Context dependent vector quantization for continuous speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mario A. Pei,et al.  Glossary of Linguistic Terminology , 1966 .

[4]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.