论文信息 - The carnegie mellon communicator corpus

The carnegie mellon communicator corpus

As part of the DARPA Communicator program, Carnegie Mellon has, over the past three years, collected a large corpus of speech produced by callers to its Travel Planning system. To date, a total of 180,605 utterances (90.9 hours) have been collected. The data were used for a number of purposes, including acoustic and language modeling and the development of a spoken dialog system. The collection, transcription and annotation of these data prompted us to develop a number of procedures for managing the transcription process and for ensuring accuracy. We describe these, as well as some results based on these data. A portion of this corpus, covering the years 1999-2001, is being published for research purposes.

Alexander I. Rudnicky | Christina L. Bennett

[1] Alexander I. Rudnicky,et al. Dialog analysis in the carnegie mellon communicator , 1999, EUROSPEECH.

[2] Maxine Eskénazi,et al. Data collection and processing in the carnegie mellon communicator , 1999, EUROSPEECH.

[3] Alexander I. Rudnicky,et al. Modeling the cost of misunderstanding errors in the CMU Communicator dialog system , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[4] Rong Zhang,et al. Is this conversation on track? , 2001, INTERSPEECH.