Construction of an advanced in-car spoken dialogue corpus and its characteristic analysis

This paper describes an advanced spoken language corpus which has been constructed by enhancing an in-car speech database. The corpus has the following characteristic features: (1) Advanced tag: Not only linguistic phenomena tags but also advanced discourse tags such as sentential structures, and utterance intentions, have been provided for the transcribed texts. (2) Large-scale: The sentential structures and the intentions are currently provided for 45,053 phrases and 35,421 utterance units, respectively. (3) Multi-layer: The corpus consists of different levels of spoken language data such as speech signals, transcribed texts, sentential structures, intentional markers and dialogue structures, moreover, they are related with each other. It allows a very wide variety of analysis of spontaneous spoken dialogue to utilize the multi-layered corpus. This paper also reports the result of investigation of the corpus, especially, forcusing on the relations between the syntactic style and the intentional style of spoken utterances.