Development of a Chinese telephony conversational corpus for speech processing [speech recognition applications]

This paper describes the development of the EARS (effective, affordable, reusable speech-to-text) Chinese corpus, a telephony conversational speech database for speech processing. The EARS database is the first of its kind collected for Mandarin Chinese telephony spontaneous speech. The purpose of developing this EARS Chinese corpus is to collect Mandarin conversations between either strangers or friends, which cover a wide range of topics, over landline and cellular channels. All the speech data are annotated with standard Chinese character transcription as well as specific mark-ups for spontaneous speech. This corpus will be used for conversational and spontaneous Mandarin speech recognition tasks, under the DARPA EARS framework. This paper introduces the design, development, structure, and initial phonetic analysis of the first 50-hour collection of this corpus. Additional 300 to 500 hours of data will be collected and transcribed between 2004 and 2005.

[1]  Pascale Fung,et al.  Modeling partial pronunciation variations for spontaneous Mandarin speech recognition , 2002, Comput. Speech Lang..

[2]  Yiqing Zu Sentence design for speech synthesis and speech recognition database by phonetic rules , 1997, EUROSPEECH.

[3]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[4]  Ren-Hua Wang,et al.  USTC95-a Putonghua corpus , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Tan Lee,et al.  Spoken language resources for Cantonese speech processing , 2002, Speech Commun..

[6]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  William J. Byrne,et al.  CASS: a phonetically transcribed corpus of mandarin spontaneous speech , 2000, INTERSPEECH.