论文信息 - Teaching Computers to Conduct Spoken Interviews: Breaking the Realtime Barrier with Learning - 字舞流文

Teaching Computers to Conduct Spoken Interviews: Breaking the Realtime Barrier with Learning

Several challenges remain in the effort to build software capable of conducting realtime dialogue with people. Part of the problem has been a lack of realtime flexibility, especially with regards to turntaking. We have built a system that can adapt its turntaking behavior in natural dialogue, learning to minimize unwanted interruptions and "awkward silences". The system learns this dynamically during the interaction in less than 30 turns, without special training sessions. Here we describe the system and its performance when interacting with people in the role of an interviewer. A prior evaluation of the system included 10 interactions with a single artificial agent (a non-learning version of itself); the new data consists of 10 interaction sessions with 10 different humans. Results show performance to be close to a human's in natural, polite dialogue, with 20% of the turn transitions taking place in under 300 msecs and 60% under 500 msecs. The system works in real-world settings, achieving robust learning in spite of noisy data. The modularity of the architecture gives it significant potential for extensions beyond the interview scenario described here.

Kristinn R. Thórisson | Gudny Ragna Jonsdottir | K. Thórisson | G. R. Jonsdottir

[1] Kristinn R. Thórisson,et al. Natural Turn-Taking Needs No Manual: Computational Theory and Model, from Perception to Action , 2002 .

[2] Cecilia E. Ford,et al. Interaction and grammar: Interactional units in conversation: syntactic, intonational, and pragmatic resources for the management of turns , 1996 .

[3] Lina Markauskaite,et al. Towards an integrated analytical framework of information and communications technology literacy: from intended to implemented and achieved dimensions , 2006, Inf. Res..

[4] Margaret Wilson,et al. An oscillator model of the timing of turn-taking , 2005, Psychonomic bulletin & review.

[5] Allen Newell,et al. The model human processor: An engineering model of human performance. , 1986 .

[6] M. Walker. Smooth Transitions in Conversational Turn-Taking: Implications for Theory , 1982 .

[7] Sandra A. Thompson,et al. Interaction and grammar: Frontmatter , 1996 .

[8] David Schlangen,et al. From reaction to prediction: experiments with computational models of turn-taking , 2006, INTERSPEECH.

[9] Philip R. Cohen,et al. Intentions in Communication. , 1992 .

[10] E. Schegloff,et al. A simplest systematics for the organization of turn-taking for conversation , 1974 .

[11] Kristinn R. Thórisson,et al. Constructionist Design Methodology for Interactive Intelligences , 2004, AI Mag..

[12] Louis-Philippe Morency,et al. Predicting Listener Backchannels: A Probabilistic Multimodal Approach , 2008, IVA.

[13] Kristinn R. Thórisson,et al. Learning Smooth, Human-Like Turntaking in Realtime Dialogue , 2008, IVA.

[14] Kris Thórisson,et al. Machine perception of real-time multimodal natural dialogue , 2002 .

[15] Louis ten Bosch,et al. On temporal aspects of turn taking in conversational dialogues , 2005, Speech Commun..

[16] C. Goodwin. Conversational Organization: Interaction Between Speakers and Hearers , 1981 .

[17] Kristinn R. Thórisson,et al. Dialogue control in social interface agents , 1993, INTERCHI Adjunct Proceedings.

[18] Kristinn R. Thórisson,et al. Towards a neurocognitive model of realtime turntaking in face-to-face dialogue , 2008 .

[19] Mikio Nakano,et al. Learning decision trees to determine turn-taking by spoken dialogue systems , 2002, INTERSPEECH.

[20] Kristinn R. Thórisson,et al. A Granular Architecture for Dynamic Realtime Dialogue , 2008, IVA.

[21] G. Jefferson. Preliminary notes on a possible metric which provides for a 'standard maximum' silence of approximately one second in conversation. , 1989 .

[22] Stacy Marsella,et al. Virtual Rapport , 2006, IVA.

[23] Andreas Stolcke,et al. Observations on overlap: findings and implications for automatic processing of multi-party conversation , 2001, INTERSPEECH.

[24] J. Cassell,et al. Communicative humanoids: a computational model of psychosocial dialogue skills , 1996 .

[25] Ipke Wachsmuth,et al. Embodied Communication in Humans and Machines , 2008, AI Mag..