Human-machine dialogue systems typically support dialogue between two agents: the human user is one agent, and the system plays the part of the other. In this scenario, the user and the system take turns at being the speaker, and when one of them is the speaker, the other is the addressee (the agent being spoken to).
However, in real life dialogue, there are frequently more than two participants. Automated dialogue systems can be configured in various ways to operate in a multi-speaker scenario. Firstly, a system can simulate each dialogue participant as a separate autonomous agent (e.g. Padilha and Carletta [1]). Secondly, a system can play the part of a single agent in a context where there are several human speakers (Wang, [2]). Finally, the system could support a dialogue between a single human user and several agents, all of which are played by the system. Here the agents can either be genuinely autonomous, or they can act in the service of a shared plan, delivering lines given to them by a central controller.
To extend a dialogue system to deal with multi-speaker interactions, whichever of the above scenarios is envisaged, a number of things must be supplied. At the dialogue level, we need a theory of turn-taking, to decide when to make an utterance, and who the addressees of other speakers' utterances are. At the level of sentence syntax and semantics, we need to pay special attention to constructions which are used to refer to dialogue participants (especially personal pronouns) and which are used to control turn-taking (especially terms of address).
We have already built a two-speaker dialogue system, which incorporates full sentence parsing and generation using a declarative grammar, and a range of standard dialogue management techniques (de Jager et al [3]; Bayard et al, [4]). This paper describes how we are extending this system to a multi-speaker environment, focussing on the additional syntactic constructions and dialogue management principles which are required, and on the interactions between these.
[1]
Alistair Knott,et al.
Syntactic disambiguation using presupposition resolution
,
2003,
ALTA.
[2]
Jhing-Fa Wang,et al.
Multi-Speaker Dialogue for Mobile Information Retrieval
,
2002
.
[3]
Candace L. Sidner,et al.
Attention, Intentions, and the Structure of Discourse
,
1986,
CL.
[4]
Rob A. van der Sandt,et al.
Presupposition Projection as Anaphora Resolution
,
1992,
J. Semant..
[5]
Dan Flickinger,et al.
Minimal Recursion Semantics: An Introduction
,
2005
.
[6]
E. Schegloff,et al.
A simplest systematics for the organization of turn-taking for conversation
,
1974
.
[7]
Philip R. Cohen,et al.
Discourse Processing and Commonsense Plans
,
2003
.
[8]
J. Carletta,et al.
A simulation of small group discussion
,
2002
.
[9]
Uwe Reyle,et al.
From discourse to logic
,
1993
.
[10]
Brendan McCane,et al.
Language-driven nonverbal communication in a bilingual conversational agent
,
2003,
Proceedings 11th IEEE International Workshop on Program Comprehension.