Speech recognition on Mandarin Call Home: a large-vocabulary, conversational, and telephone speech corpus

We describe IBM's most recent efforts for speech recognition on a conversational-speech database, the Mandarin Call Home corpus. While it is similar to the well-known Switchboard corpus, the Call Home task addresses several major challenges in the domain of spoken language systems, including spontaneous dialogue with no pre-specified topics, limited-bandwidth telephone signal, and recognition of other languages than English. We particularly describe the methodology used in Mandarin Call Home corpus to address language-specific issues. We also examine and compare our results with those of the English Switchboard corpus. Preliminary experiments show that a 58.7% character error rate can be achieved in the context of April 95 Mandarin Call Home data set. The experimental results are comparable to those of the state-of-the-art IBM Switchboard system with similar amount of training data.

[1]  Herbert Gish,et al.  Reducing word error rate on conversational speech from the Switchboard corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Bruce M. Lairson,et al.  Reducing Intergranular Magnetic Coupling by Incorporating Carbon into Co/Pd Multilayers , 1995 .

[3]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  Chiu-yu Tseng,et al.  Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary , 1993, IEEE Trans. Speech Audio Process..

[6]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.