PHONETIC MODELLING IN THE PHILIPS CHINESE CONTINUOUS � SPEECH RECOGNITION SYSTEM

We have extended the Philips large vocabulary continuous speech recognition system towards Chinese On the way from our existing Western language technology to Mandarin the rst step was to build a suitable phonetic model This paper describes the development of our phonetic model excluding tones for Mandarin Chinese We will present a systematic comparison of three forms of sub syllabic units for Chinese phonemes initials nals and a non tonal form of preme toneme models as well as whole syllable models for reference We include experiments on bottom up and decision tree based top down state clustering and modelling of cross syllable contexts All forms of sub syllabic units are represented in the Philips Mandarin phone set SAMPA C SAMPA C is based on the European SAMPA standard and introduced in this paper Our studies show that traditional half syllable approaches slightly outperform Western style triphones Modelling of right context dependency gives greater improvement than left context dependency and cross syllable modelling yields a performance gain In a free syllable decoding task we achieve syllable error rate for telephone speech and for microphone dictations

[1]  Robert Henry Mathews,et al.  Mathews' Chinese–English Dictionary , 1931 .

[2]  Hsiao-Wuen Hon,et al.  Vocabulary-independent speech recognition: the Vocind System , 1992 .

[3]  Janet M. Baker,et al.  Large vocabulary continuous speech recognition of Wall Street Journal data , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Steve J. Young,et al.  The use of state tying in continuous speech recognition , 1993, EUROSPEECH.

[5]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[6]  Xavier L. Aubert,et al.  The Philips large-vocabulary recognition system for american English, French, and German , 1995, EUROSPEECH.

[7]  R. Haeb-Umbach,et al.  Application of clustering techniques to mixture density modelling for continuous-speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Chiu-yu Tseng,et al.  Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Jun Wu,et al.  Methods towards the very large vocabulary Chinese speech recognition , 1995, EUROSPEECH.

[10]  John C. Wells,et al.  Computer-coding the IPA: a proposed extension of SAMPA , 1995 .

[11]  Lou Boves,et al.  Localizing an automatic inquiry system for public transport information , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  A. Kellner,et al.  A voice-controlled automatic telephone switchboard and directory information system , 1996, Proceedings of IVTTA '96. Workshop on Interactive Voice Technology for Telecommunications Applications.

[13]  Michael Picheny,et al.  New methods in continuous Mandarin speech recognition , 1997, EUROSPEECH.

[14]  Hsiao-Chuan Wang,et al.  MAT - A Project to Collect Mandarin Speech Data Through Telephone Net works in Taiwan , 1997, Int. J. Comput. Linguistics Chin. Lang. Process..

[15]  Peter Beyerlein,et al.  Modelling and decoding of crossword context dependent phones in the Philips large vocabulary continuous speech recognition system , 1997, EUROSPEECH.

[16]  Hermann Ney,et al.  Automatic question generation for decision tree based state tying , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  Fu-Chiang Chou,et al.  Machine readable phonetic transcription system for Chinese dialects spoken in Taiwan , 1999 .