Collaborative Blind Source Separation Using Location Informed Spatial Microphones

This letter presents a new Collaborative Blind Source Separation (CBSS) technique that uses a pair of location informed coincident microphone arrays to jointly separate simultaneous speech sources based on time-frequency source localization estimates from each microphone recording. While existing BSS approaches are based on localization estimates of sparse time-frequency components, the proposed approach can also recover non-sparse (overlapping) time-frequency components. The proposed method has been evaluated using up to three simultaneous speech sources under both anechoic and reverberant conditions. Results from objective and subjective measures of the perceptual quality of the separated speech show that the proposed approach significantly outperforms existing BSS approaches.

[1]  Randy G. Goldberg,et al.  A Practical Handbook of Speech Coders , 2000 .

[2]  D. R. Campbell,et al.  A MATLAB Simulation of “ Shoebox ” Room Acoustics for use in Research and Teaching , 2022 .

[3]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[4]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[5]  Hiroshi Sawada,et al.  Spatio–Temporal FastICA Algorithms for the Blind Separation of Convolutive Mixtures , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Muawiyath Shujau,et al.  Separation of speech sources using an Acoustic Vector Sensor , 2011, 2011 IEEE 13th International Workshop on Multimedia Signal Processing.

[7]  Sugato Chakravarty,et al.  Method for the subjective assessment of intermedi-ate quality levels of coding systems , 2001 .

[8]  Marina Bosi,et al.  Introduction to Digital Audio Coding and Standards , 2004, J. Electronic Imaging.

[9]  Sharon Gannot,et al.  Adaptive Beamforming and Postfiltering , 2008 .

[10]  Ahmet M. Kondoz,et al.  Acoustic Source Separation of Convolutive Mixtures Based on Intensity Vector Statistics , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Jiangtao Xi,et al.  Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  David M. Howard,et al.  Acoustics and Psychoacoustics , 2006 .

[13]  J.B. Millar,et al.  The Australian National Database of Spoken Language , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.