Microphone-location dependent mask estimation for BSS using spatially distributed asynchronous microphones

Distributed microphone array (DMA) processing has recently been attracting a lot of attention as a promising alternative to conventional microphone arrays with co-located elements. To perform efficient blind source separation (BSS) using spatially distributed microphones, we have recently proposed an approach for estimating microphone-dependent mask, equivalently microphone-dependent source activity, and performing BSS based on such information. Our method was proposed based on the fact that, in DMA scenarios, the source activities observable at any given microphone may be significantly different from those of others when the microphones are spatially distributed to a great degree, and the level of each signal at each microphone varies significantly. In this paper, we revisit the formulation of the proposed method, and investigate its performance in the presence of drift error, i.e., the sampling frequency mismatch between different microphones. To make the proposed method robust to drift error, we introduce different parameter initialization schemes and analyze their effect on the overall performance of the proposed method.

[1]  Zicheng Liu SOUND SOURCE SEPARATION WITH DISTRIBUTED MICROPHONE ARRAYS IN THE PRESENCE OF CLOCK SYNCHRONIZATION ERRORS , 2008 .

[2]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[3]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[4]  Alexander Bertrand,et al.  Applications and trends in wireless acoustic sensor networks: A signal processing perspective , 2011, 2011 18th IEEE Symposium on Communications and Vehicular Technology in the Benelux (SCVT).

[5]  Masakiyo Fujimoto,et al.  Joint unsupervised learning of hidden Markov source models and source location models for multichannel source separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Tomohiro Nakatani,et al.  On the robustness of distributed EM based BSS in asynchronous distributed microphone array scenarios , 2013, INTERSPEECH.

[7]  Jacob Benesty,et al.  On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Marc Moonen,et al.  Performance Analysis of Multichannel Wiener Filter-Based Noise Reduction in Hearing Aids Under Second Order Statistics Estimation Errors , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Minerva M. Yeung,et al.  On the importance of exact synchronization for distributed audio signal processing , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Masakiyo Fujimoto,et al.  LogMax observation model with MFCC-based spectral prior for reduction of highly nonstationary ambient noise , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[12]  Tomohiro Nakatani,et al.  Blind source separation using spatially distributed microphones based on microphone-location dependent source activities , 2013, INTERSPEECH.

[13]  Hiroshi Sawada,et al.  Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Boaz Rafaely,et al.  Microphone Array Signal Processing , 2008 .

[15]  Hiroshi Sawada,et al.  A multichannel MMSE-based framework for joint blind source separation and noise reduction , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Sridha Sridharan,et al.  Clustered Blind Beamforming From Ad-Hoc Microphone Arrays , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Nobutaka Ito,et al.  Blind alignment of asynchronously recorded signals for distributed microphone array , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[18]  Marc Moonen,et al.  Distributed Adaptive Estimation of Node-Specific Signals in Wireless Sensor Networks With a Tree Topology , 2011, IEEE Transactions on Signal Processing.

[19]  Marc Moonen,et al.  Frequency-domain criterion for the speech distortion weighted multichannel Wiener filter for robust noise reduction , 2007, Speech Commun..

[20]  Tomohiro Nakatani,et al.  Distributed microphone array processing for speech source separation with classifier fusion , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[21]  Francesco Nesta,et al.  Cooperative Wiener-ICA for source localization and Separation by distributed microphone arrays , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.

[23]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[24]  Masakiyo Fujimoto,et al.  Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds , 2013, Comput. Speech Lang..

[25]  Marc Moonen,et al.  Reduced-Bandwidth and Distributed MWF-Based Noise Reduction Algorithms for Binaural Hearing Aids , 2009, IEEE Transactions on Audio, Speech, and Language Processing.