BLIND SPEECH SEPARATION BY COMBINING BEAMFORMERS AND A TIME FREQUENCY BINARY MASK

This paper describes a new method for blind speech separation (BSS) of convolutive mixtures. Our approach is based on a widely used speech enhancement method called beamforming. We utilize this technique for BSS by combining a beamformer and a time-frequency binary mask (TFBM) in one system. We propose two different approaches using the same basis but with a different setup. The first approach is designed for (over-)determined configurations, i.e. the number of sensors is equal to or greater than the number of sources. The second approach is designed for underdetermined configurations, i.e. the sources outnumber the sensors. Experimental results show that the proposed approach provides better results than the sole use of a conventional TFBM or a conventional beamformer.

[1]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[2]  Hiroshi Sawada,et al.  Doa Estimation for Multiple Sparse Sources with Normalized Observation Vector Clustering , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Hiroshi Sawada,et al.  A NOVEL BLIND SOURCE SEPARATION METHOD WITH OBSERVATION VECTOR CLUSTERING , 2005 .

[4]  Hiroshi Sawada,et al.  Frequency-Domain Blind Source Separation , 2007, Blind Speech Separation.

[5]  C. Burrus,et al.  Array Signal Processing , 1989 .

[6]  E. Oja,et al.  Independent Component Analysis , 2013 .