In real spoken language applications, where speakers interact spontaneously, there is much seeming unpredictability that makes recognition difficult. Multi-speaker spontaneous dialog where two speakers interact verbally to cooperatively solve a mutual, shared problem is more varied than human-computer interactions. Spontaneous speech is not well structured, exhibiting mid-utterance corrections and restarts in utterances. Discourse contains digressions, clarifications, corrections and topic changes. But, multi-speaker discourse is even more varied, with initiative effects, speakers interacting, planning and responding. This makes it extremely difficult to develop grammars and language models with adequate coverage and reliable stochastic parameters. Perplexity increases and recognition degrades considerably vis-a-vis human-database dialog. In spite of all this, multi-speaker dialogs are structured and predictable when the discourse is appropriately modelled. We have developed heuristics to model spontaneous speech and multi-speaker dialogs. The underlying heuristics have been evaluated and shown to adequately and accurately predict discourse phenomena, as evaluated on a 10,000+ utterance corpus. Generally, the heuristics for computing discourse structure and deriving constraints from it are rule based. We have taken the rules and used them to develop a set of stochastic RTNs that capture both the rules and corpus probabilities. The resulting language model can be used predictively to dynamically generate stochastic utterance predictions or can be incorporated into any recognition/understanding system where a single prior state is maintained.
[1]
Wayne H. Ward,et al.
Semantic and pragmatically based re-recognition of spontaneous speech
,
1993,
EUROSPEECH.
[2]
Candace L. Sidner,et al.
Attention, Intentions, and the Structure of Discourse
,
1986,
CL.
[3]
James F. Allen,et al.
Generic Plan Recognition for Dialogue Systems
,
1993,
HLT.
[4]
Wayne H. Ward,et al.
High level knowledge sources in usable speech recognition systems
,
1990
.
[5]
James F. Allen.
Discourse Structure in the TRAINS Project
,
1991,
HLT.
[6]
James F. Allen,et al.
A Plan Recognition Model for Subdialogues in Conversations
,
1987,
Cogn. Sci..
[7]
K. Matrouf,et al.
Adapting probability-transitions in DP matching processing for an oral task-oriented dialogue
,
1990,
International Conference on Acoustics, Speech, and Signal Processing.