论文信息 - Improving tone recognition with combined frequency and amplitude modelling

Improving tone recognition with combined frequency and amplitude modelling

To improve tone recognition in continuous speech, we propose a strategy focusing on separating regions influenced by tonal coarticulation from regions that more closely approximate canonical tone production. Given a syllable segmentation, this approach employs amplitude and pitch information to generate an improved sub-syllable segmentation and feature representation. This subsyllable segmentation is derived from the convex hull of the amplitude-pitch plot. Our approach achieves a 15% improvement using our segmentation strategy over a simple time-only segmentation. Finally, a future extension with sequential labelling is discussed.

Gina-Anne Levow | Siwei Wang | Gina-Anne Levow | Siwei Wang

[1] Thomas Hofmann,et al. Hidden Markov Support Vector Machines , 2003, ICML.

[2] Gina-Anne Levow,et al. Context in multi-lingual tone and pitch accent recognition , 2005, INTERSPEECH.

[3] Yu Hu,et al. Visual cues in Mandarin tone perception , 2005, INTERSPEECH.

[4] Wayne H. Ward,et al. Confidence measures for dialogue management in the CU Communicator system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5] Yi Xu. Transmitting Tone and Intonation Simultaneously — The Parallel Encoding and Target Approximation ( PENTA ) Model , 2004 .

[6] D H Whalen,et al. Information for Mandarin Tones in the Amplitude Contour and in Brief Segments , 1990, Phonetica.

[7] Stephanie Seneff,et al. Improved tone recognition by normalizing for coarticulation and intonation effects , 2000, INTERSPEECH.

[8] Fan-Gang Zeng,et al. Speech recognition with amplitude and frequency modulations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9] Vladimir Cherkassky,et al. The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[10] Keikichi Hirose,et al. Tone nucleus modeling for Chinese lexical tone recognition , 2004, Speech Commun..

[11] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .

[12] Chilin Shih,et al. Prosody modeling with soft templates , 2003, Speech Commun..

[13] Paul Boersma,et al. Praat: doing phonetics by computer , 2003 .