Acoustic Cues for Classifying Communicative Intentions in Dialogue Systems

Filled pauses are normally used as a planning strategy: they signal speaker's intention to hold the floor in a conversation. They are normally realised by inserting a vowel (optionally followed by a nasal), but in Italian they can be produced by lengthening the final vowel of a word. Word final lengthening filled pauses are then an intermediate category between lexical and non-lexical speech event. In human machine interaction, the system should be able to discriminate between a "default" lexical speech event and one characterised by a word final lengthening for planning strategy: in this second case, the related communicative intention has to be additionally recognised. Our preliminary investigation shows that duration and F0 shape are reliable acoustic cues for identifying word final lengthening filled pauses in a variety of Italian.