A comparative study on acoustic and linguistic characteristics using speech from human-to-human and human-to-machine conversations

Speech translation and dialogue systems must accept conversational speech. In this paper, we discuss acoustic and linguistic characteristics based on results of speech recognition experiments using speech from human-to-human and human-to-machine conversations. Conversational speech inputs to machines consist of frozen expressions such as greetings and yes/no statements, and informative individual expressions like numerical data such as dates and telephone numbers. The former has a lower perplexity and acoustic characteristics close to spontaneous speech. The latter has a higher perplexity and acoustic characteristics close to read speech. Each utterance or each inter-pausal unit can be classi ed into the former or the latter. This new knowledge will help future research on speech translation and dialogue systems.

[1]  Hitoshi Iida,et al.  A speech and language database for speech translation research , 1994, ICSLP.

[2]  Atsushi Nakamura,et al.  Japanese speech databases for robust speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Toshiyuki Takezawa,et al.  End-to-end evaluation in ATR-MATRIX: speech translation system between English and Japanese , 1999, EUROSPEECH.