Estimating competing speaker count for blind speech source separation

We present a method for estimating the number of simultaneous speakers for direct integration with blind speech source separation algorithms. The method was developed to use single microphone recordings but is fully compatible with microphone-array approaches. Speech source separation algorithms based on independent component analysis, multiband analysis or spectral learning need the number of concurrent speakers as an input parameter. This is estimated based on pattern matching techniques between the spectrogram of the speech mixture and the ones associated to a set of single speaker references. The method demonstrated to scale up until at least 10 concurrent speakers. Additionally we highlight the separation performance of various speech separation algorithms using mixtures with 3 competing speeches.

[1]  Hiroshi Sawada,et al.  Spatio–Temporal FastICA Algorithms for the Blind Separation of Convolutive Mixtures , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  É. Moulines,et al.  Second Order Blind Separation of Temporally Correlated Sources , 1993 .

[3]  Michael I. Jordan,et al.  Blind One-microphone Speech Separation: A Spectral Learning Approach , 2004, NIPS.

[4]  Lucas C. Parra,et al.  A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS , 2007 .

[5]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[6]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[7]  Andrzej Cichocki,et al.  Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications , 2002 .

[8]  John W. Fisher,et al.  ICA Using Spacings Estimates of Entropy , 2003, J. Mach. Learn. Res..

[9]  Sam T. Roweis,et al.  One Microphone Source Separation , 2000, NIPS.

[10]  S. Amari,et al.  Flexible Independent Component Analysis , 1998, Neural Networks for Signal Processing VIII. Proceedings of the 1998 IEEE Signal Processing Society Workshop (Cat. No.98TH8378).

[11]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[12]  Christian Jutten,et al.  Visual voice activity detection as a help for speech source separation from convolutive mixtures , 2007, Speech Commun..

[13]  Valentin Andrei,et al.  Counting competing speakers in a timeframe - human versus computer , 2015, INTERSPEECH.

[14]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[15]  Valentin Andrei,et al.  Detecting the number of competing speakers - human selective hearing versus spectrogram distance based estimator , 2014, INTERSPEECH.

[16]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[17]  E. Oja,et al.  Nonlinear Blind Source Separation by Variational Bayesian Learning , 2003, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[18]  Takayuki Arai,et al.  Estimating number of speakers by the modulation characteristics of speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Siham Ouamour,et al.  Proposal of a New Confidence Parameter Estimating the Number of Speakers -An experimental investigation- , 2010, J. Inf. Hiding Multim. Signal Process..

[20]  Eric A. Lehmann,et al.  Reverberation-Time Prediction Method for Room Impulse Responses Simulated with the Image-Source Model , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[21]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[22]  E. Lehmann,et al.  Prediction of energy decay in room impulse responses simulated with an image-source model. , 2008, The Journal of the Acoustical Society of America.

[23]  Andrzej Cichocki,et al.  A New Learning Algorithm for Blind Signal Separation , 1995, NIPS.

[24]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.