Effective Speaker Tracking Strategies for Multi-party Human-Computer Dialogue

Human-computer dialogue is already a rather mature research field [10] that already boiled down to several commercial applications, either service or task-oriented [11]. Nevertheless, several issues remain to be tackled, when unrestricted, spontaneous dialogue is concerned: barge-in (when users interrupt the system or interrupt each other) must be properly handled, hence Voice Activity Detection is a crucial point [13]. Moreover, when multi-party interactions are allowed (i.e., the machine engages simultaneously in dialogue with several users), supplementary robustness constraints occur: the speakers have to be properly tracked, so that each utterance is mapped to a certain speaker that had produced it. This is needed in order to perform a reliable analysis of input utterances [2].

[1]  Zhipeng Zhang,et al.  On-line incremental speaker adaptation for broadcast news transcription , 2002, Speech Commun..

[2]  Hiroyuki Segi,et al.  Acoustic model adaptation by selective training using two‐stage clustering , 2005 .

[3]  Frédéric Landragin,et al.  Dialogue homme-machine multimodal , 2004 .

[4]  電気学会 Electronics and communications in Japan , 2009 .

[5]  Roger K. Moore Computer Speech and Language , 1986 .

[6]  Tetsunori Kobayashi,et al.  Dictation of multiparty conversation considering speaker individuality and turn taking , 2003, Systems and Computers in Japan.

[7]  Petr Motlícek,et al.  Non-parametric speaker turn segmentation of meeting data , 2005, INTERSPEECH.

[8]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[9]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[10]  Susan U. Philips,et al.  Dimensions of Sociolinguistics - Peter Trudgill, Sociolinguistics: An Introduction . Baltimore: Penguin Books Ltd, 1974. , 1976 .

[11]  Jonathan Ginzburg,et al.  Scaling up from Dialogue to Multilogue: Some Principles and Benchmarks , 2005, ACL.

[12]  Dov M. Gabbay,et al.  Research on Language and Computation , 2003 .

[13]  Corneliu Burileanu,et al.  Parallel training algorithms for continuous speech recognition, implemented in a message passing framework , 2006, 2006 14th European Signal Processing Conference.

[14]  Steve Young,et al.  The HTK book , 1995 .

[15]  Staffan Larsson,et al.  Information state and dialogue management in the TRINDI dialogue move engine toolkit , 2000, Natural Language Engineering.

[16]  Heidi Christensen,et al.  Speaker Adaptation of Hidden Markov Models using Maximum Likelihood Linear Regression , 1996 .

[17]  Kiyohiro Shikano,et al.  Unsupervised speaker adaptation for robust speech recognition in real environments , 2005 .

[18]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[19]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[20]  Claude Barras Reconnaissance de la parole continue : adaptation au locuteur et controle temporel dans les modeles de markov caches , 1996 .

[21]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[22]  Holly P. Branigan,et al.  Perspectives on Multi-party Dialogue , 2006 .

[23]  P. Trudgill Sociolinguistics: An Introduction to Language and Society , 1975 .

[24]  Frédéric Landragin,et al.  Compte rendu de lecture de l'ouvrage Interaction et pragmatique. Jeux de dialogue et de langage de J. Caelen & A. Xuereb , 2007 .

[25]  Branimir Boguraev,et al.  Natural Language Engineering , 1995 .

[26]  Kiyohiro Shikano,et al.  Unsupervised acoustic model adaptation algorithm using MLLR in a noisy environment , 2006 .