Floor holder detection and end of speaker turn prediction in meetings

We propose a novel fully automatic framework to detect which meeting participant is currently holding the conversational floor and when the current speaker turn is going to finish. Two sets of experiments were conducted on a large collection of multiparty conversations: the AMI meeting corpus. Unsupervised speaker turn detection was performed by post-processing the speaker diarization and the speech activity detection outputs. A supervised end-of-speaker-turn prediction framework, based on Dynamic Bayesian Networks and automatically extracted multimodal features (related to prosody, overlapping speech, and visual motion), was also investigated. These novel approaches resulted in good floor holder detection rates (13:2% Floor Error Rate), attaining state of the art end-of-speaker-turn prediction performances.

[1]  Jitendra Ajmera,et al.  Robust audio segmentation , 2004 .

[2]  Marijn Huijbregts,et al.  Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .

[3]  Daniel Gatica-Perez,et al.  Automatic nonverbal analysis of social interaction in small groups: A review , 2009, Image Vis. Comput..

[4]  Eric Fosler-Lussier,et al.  Combining multiple estimators of speaking rate , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Dirk Heylen,et al.  Multimodal end-of-turn prediction in multi-party meetings , 2009, ICMI-MLMI '09.

[6]  Harry Bunt,et al.  'Who's next? Speaker-selection mechanisms in multiparty dialogue' , 2009 .

[7]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[8]  Tanja Schultz,et al.  A Geometric Interpretation of Non-Target-Normalized Maximum Cross-Channel Correlation for Vocal Activity Detection in Meetings , 2007, HLT-NAACL.

[9]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[10]  Jean Carletta,et al.  Nonverbal behaviours improving a simulation of small group discussion , 2003 .

[11]  David Schlangen,et al.  From reaction to prediction: experiments with computational models of turn-taking , 2006, INTERSPEECH.

[12]  Mary P. Harper,et al.  Multimodal floor control shift detection , 2009, ICMI-MLMI '09.

[13]  Jean Carletta,et al.  Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.