Development of spoken language corpora for travel information

In this paper we report on our ongoing work in developing spoken language corpora in the context of information access in two travel domain tasks, L’ATIS and MASK. The collection of spoken language corpora remains an important research area and represents a significant portion of work in the development of spoken language systems. The use of additional acoustic and language model training data has been shown to almost systematically improve performance in continuous speech recognition. Similarly, progress in spoken language understanding is closely linked to the availability of spoken language corpora. We record subjects on a regular basis using development versions of the spoken language systems for both tasks, obtaining over 1000 queries/month from 20 subjects. To help assess our progress in system development, each subject since March’95 completes a questionnaire addressing the user-friendliness, reliability, ease-of-use of the MASK data

[1]  Wolfgang Minker,et al.  A spoken language system for information retrieval , 1994, ICSLP.

[2]  David Goodine,et al.  A French version of the MIT-ATIS system: portability issues , 1993, EUROSPEECH.

[3]  Lori Lamel,et al.  The LIMSI continuous speech dictation system: evaluation on the ARPA Wall Street Journal task , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.