Improving tone recognition with combined frequency and amplitude modelling

To improve tone recognition in continuous speech, we propose a strategy focusing on separating regions influenced by tonal coarticulation from regions that more closely approximate canonical tone production. Given a syllable segmentation, this approach employs amplitude and pitch information to generate an improved sub-syllable segmentation and feature representation. This subsyllable segmentation is derived from the convex hull of the amplitude-pitch plot. Our approach achieves a 15% improvement using our segmentation strategy over a simple time-only segmentation. Finally, a future extension with sequential labelling is discussed.

[1]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[2]  Gina-Anne Levow,et al.  Context in multi-lingual tone and pitch accent recognition , 2005, INTERSPEECH.

[3]  Yu Hu,et al.  Visual cues in Mandarin tone perception , 2005, INTERSPEECH.

[4]  Wayne H. Ward,et al.  Confidence measures for dialogue management in the CU Communicator system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Yi Xu Transmitting Tone and Intonation Simultaneously — The Parallel Encoding and Target Approximation ( PENTA ) Model , 2004 .

[6]  D H Whalen,et al.  Information for Mandarin Tones in the Amplitude Contour and in Brief Segments , 1990, Phonetica.

[7]  Stephanie Seneff,et al.  Improved tone recognition by normalizing for coarticulation and intonation effects , 2000, INTERSPEECH.

[8]  Fan-Gang Zeng,et al.  Speech recognition with amplitude and frequency modulations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[10]  Keikichi Hirose,et al.  Tone nucleus modeling for Chinese lexical tone recognition , 2004, Speech Commun..

[11]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[12]  Chilin Shih,et al.  Prosody modeling with soft templates , 2003, Speech Commun..

[13]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .