Enhancing LSTM RNN-Based Speech Overlap Detection by Artificially Mixed Data

This paper presents a new method for Long Short-Term Memory Recurrent Neural Network (LSTM) based speech overlap detection. To this end, speech overlap data is created artificially by mixing large amounts of speech utterances. Our elaborate training strategies and presented network structures demonstrate performance surpassing the considered state-of-the-art overlap detectors. Thereby we target the full ternary task of non-speech, speech, and overlap detection. Furthermore, speakers’ gender is recognised, as the first successful combination of this kind within one model.

[1]  Ryuki Tachibana,et al.  Speech recognition robust against speech overlapping in monaural recordings of telephone conversations , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Jean Carletta,et al.  The AMI meeting corpus , 2005 .

[4]  Mattias Heldner,et al.  Pauses, gaps and overlaps in conversations , 2010, J. Phonetics.

[5]  Jordi Luque,et al.  Simultaneous Speech Detection With Spatial Features for Speaker Diarization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Morena Danieli,et al.  Annotating and categorizing competition in overlap speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Mattias Heldner,et al.  On the Dynamics of Overlap in Multi-Party Conversation , 2012, INTERSPEECH.

[8]  Javier Hernando,et al.  The Detection of Overlapping Speech with Prosodic Features for Speaker Diarization , 2011, INTERSPEECH.

[9]  C. West,et al.  AGAINST OUR WILL: MALE INTERRUPTIONS OF FEMALES IN CROSS‐SEX CONVERSATION * , 1979 .

[10]  David A. van Leeuwen,et al.  Speech overlap detection in a two-pass speaker diarization system , 2009, INTERSPEECH.

[11]  J. Goldberg Interrupting the discourse on interruptions , 1990 .

[12]  Björn W. Schuller,et al.  Introducing CURRENNT: the munich open-source CUDA recurrent neural network toolkit , 2015, J. Mach. Learn. Res..

[13]  Björn W. Schuller,et al.  Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization , 2012, INTERSPEECH.

[14]  Andreas Stolcke,et al.  Observations on overlap: findings and implications for automatic processing of multi-party conversation , 2001, INTERSPEECH.

[15]  Julia Hirschberg,et al.  Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies , 2004, ACL.

[16]  Dong Wang,et al.  Speech overlap detection and attribution using convolutive non-negative sparse coding , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Petra Wagner,et al.  Temporal entrainment in overlapped speech: Cross-linguistic study , 2012, INTERSPEECH.

[18]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[19]  Florian Eyben,et al.  Real-time Speech and Music Classification by Large Audio Feature Space Extraction , 2015 .

[20]  Gerald Friedland,et al.  Overlapped speech detection for improved speaker diarization in multiparty meetings , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Morena Danieli,et al.  The role of speakers and context in classifying competition in overlapping speech , 2015, INTERSPEECH.

[22]  Fabio Valente,et al.  Speaker diarization of overlapping speech based on silence distribution in meeting recordings , 2012, INTERSPEECH.

[23]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[24]  C. Leaper,et al.  Meta-Analyses of Gender Effects on Conversational Interruption: Who, What, When, Where, and How , 1998 .

[25]  Hervé Bourlard,et al.  Improved overlap speech diarization of meeting recordings using long-term conversational features , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Björn W. Schuller,et al.  Using linguistic information to detect overlapping speech , 2013, INTERSPEECH.

[27]  Björn W. Schuller,et al.  Detecting overlapping speech with long short-term memory recurrent neural networks , 2013, INTERSPEECH.