Acquisition and Labelling of a Spontaneous Speech Dialogue Corpus ∗

The current state of speech technologies has caused the development of new speech-based applications such as dialogue systems, which can be applied to several tasks. In dialogue systems, a computer interacts with users using dialogue, simulating a human being. Probabilistic models can be used to define the behaviour of a dialogue system. The estimation of these probabilistic models requires the use of large labelled corpora. Therefore, the acquisition and labelling of a dialogue corpus of the task is a usual previous step in dialogue systems development. In this work, we present the acquisition and labelling of a Spanish dialogue corpus, which refers to train service queries. We describe the Wizard of Oz strategy used for the acquisition and the labelling rules and the tools used for the labelling.