Overlapping Speech Detection with Cluster-based HMM Framework

Overlapping speech is known to be the major source of error in various speech processing algorithm. Many previous studies on overlapping speech detection focus on exploring the various feature set for representing speech and overlapping speech characteristics while using the HMM framework. In this study, however, we hypothesize that the capacity of single HMM will not be enough to cover the whole speech and overlapping speech distribution. Thus, we proposed a simple cluster-based HMM framework to construct multiple speech and overlapping speech model. The experimental results on GRID corpus show significant improvements compare to the conventional overlap detection system.

[1]  Gerald Friedland,et al.  Improved Overlapped Speech Handling for Speaker Diarization , 2011, INTERSPEECH.

[2]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[3]  Wei-Ho Tsai,et al.  Speaker Identification in Overlapping Speech , 2010, J. Inf. Sci. Eng..

[4]  Guy J. Brown,et al.  Speech and crosstalk detection in multichannel audio , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Hervé Bourlard,et al.  Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Hervé Bourlard,et al.  Improved overlap speech diarization of meeting recordings using long-term conversational features , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Björn W. Schuller,et al.  Detecting overlapping speech with long short-term memory recurrent neural networks , 2013, INTERSPEECH.

[8]  Jordi Luque,et al.  Simultaneous Speech Detection With Spatial Features for Speaker Diarization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Elizabeth Shriberg,et al.  Speaker Overlaps and ASR Errors in Meetings: Effects Before, During, and After the Overlap , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Dong Wang,et al.  Speech overlap detection and attribution using convolutive non-negative sparse coding , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).