Improved Large Vocabulary Mandarin Speech Recognition Using Prosodic and Lexical Information in Maximum Entropy Framework

Tone plays an important role in distinguishing ambiguous words in Chinese Mandarin speech recognition. In this paper, we make full use of pitch information. On the one hand, we interpolate F0 contour to make the F0 contour continuous between voiced and unvoiced segments in order to embed F0 into speech recognition system in two streams, which cepstrum and its first and second order derivatives constitute one stream , and F0 and its first and second order derivatives make up the other stream; On the other hand, we use prosodic and lexical features, as well as syllable context information under maximum entropy framework to build explicit tone modeling in rescoring the first-pass outputting lattice. Experimental results show that pitch information and the tonal cues can reduce substitution error greatly and achieve a 3.65% absolute Chinese character error rate (CER) reduction on widely used Mandarin speech recognition tasks-863 test.

[1]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[2]  R. E. Carlson,et al.  Monotone Piecewise Cubic Interpolation , 1980 .

[3]  Chiu-yu Tseng,et al.  The synthesis rules in a Chinese text-to-speech system , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  Frank Seide,et al.  Two-stream modeling of Mandarin tones , 2000, INTERSPEECH.

[5]  Zhang Le,et al.  Maximum Entropy Modeling Toolkit for Python and C , 2004 .

[6]  Gang Peng,et al.  Tone recognition of continuous Cantonese speech based on support vector machines , 2005, Speech Commun..

[7]  Bo Xu,et al.  Decision tree based Mandarin tone model and its application to speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Bo Xu,et al.  Update progress of Sinohear: advanced Mandarin LVCSR system at NLPR , 2000, INTERSPEECH.

[9]  Hao Huang,et al.  Discriminative incorporation of explicitly trained tone models into lattice based rescoring for Mandarin speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Steve Young,et al.  The HTK book , 1995 .

[11]  Tan Lee,et al.  Using tone information in Cantonese continuous speech recognition , 2002, TALIP.

[12]  Frank Seide,et al.  Pitch tracking and tone features for Mandarin speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  Hao Wu,et al.  Exploiting prosodic and lexical features for tone modeling in a conditional random field framework , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Michael Picheny,et al.  New methods in continuous Mandarin speech recognition , 1997, EUROSPEECH.

[15]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[16]  Mei-Yuh Hwang,et al.  Improved tone modeling for Mandarin broadcast news speech recognition , 2006, INTERSPEECH.

[17]  Hwee Tou Ng,et al.  A maximum entropy approach to information extraction from semi-structured and free text , 2002, AAAI/IAAI.

[18]  Lin-Shan Lee,et al.  Phonetic state tied-mixture tone modeling for large vocabulary continuous Mandarin speech recognition , 1999, EUROSPEECH.

[19]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[20]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[21]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[22]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Frank K. Soong,et al.  Prosody for Mandarin speech recognition: a comparative study of read and spontaneous speech , 2008, INTERSPEECH.

[24]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[25]  Frank K. Soong,et al.  A multi-space distribution (MSD) approach to speech recognition of tonal languages , 2006, INTERSPEECH.

[26]  Rainer Gruhn,et al.  Experiments on Chinese speech recognition with tonal models and pitch estimation using the Mandarin speecon data , 2006, INTERSPEECH.

[27]  Sin-Horng Chen,et al.  Tone recognition of continuous Mandarin speech based on neural networks , 1995, IEEE Trans. Speech Audio Process..

[28]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[29]  Gang Peng,et al.  An Innovative Prosody Modeling Method for Chinese Speech Recognition , 2004, Int. J. Speech Technol..