论文信息 - Multi-Speaker Localization Using Convolutional Neural Network Trained with Noise

Multi-Speaker Localization Using Convolutional Neural Network Trained with Noise

The problem of multi-speaker localization is formulated as a multi-class multi-label classification problem, which is solved using a convolutional neural network (CNN) based source localization method. Utilizing the common assumption of disjoint speaker activities, we propose a novel method to train the CNN using synthesized noise signals. The proposed localization method is evaluated for two speakers and compared to a well-known steered response power method.

Emanuel A. P. Habets | Soumitro Chakrabarty | Emanuël Habets | Soumitro Chakrabarty

[1] Jacob Benesty,et al. Real-time passive source localization: a practical linear-correction least-squares approach , 2001, IEEE Trans. Speech Audio Process..

[2] Francesco Piazza,et al. A neural network based algorithm for speaker localization in a multi-room environment , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[3] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .

[4] R. O. Schmidt,et al. Multiple emitter location and signal Parameter estimation , 1986 .

[5] Özgür Yilmaz,et al. On the approximate W-disjoint orthogonality of speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Michael A. Arbib,et al. The handbook of brain theory and neural networks , 1995, A Bradford book.

[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8] Jingdong Chen,et al. Microphone Array Signal Processing , 2008 .

[9] Michael S. Brandstein,et al. A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12] Kazunori Komatani,et al. Discriminative multiple sound source localization based on deep neural networks using independent location model , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[13] Emanuel A. P. Habets,et al. Broadband doa estimation using convolutional neural networks trained with noise signals , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[14] Samy Bengio,et al. The Handbook of Brain Theory and Neural Networks , 2002 .

[15] Guy J. Brown,et al. Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions , 2015, INTERSPEECH.