Speech separation algorithms for multiple speaker environments

Conventional speaker identification and speech recognition algorithms do not perform well if there are multiple speakers in the background. For high performance speaker identification and speech recognition applications in multiple speaker environments, a speech separation stage is essential. Here we summarize the implementation of three speech separation techniques. Advantages and disadvantages of each method are highlighted, as no single method can work under all situations. Stand-alone software prototypes for these methods have been developed and evaluated.

[1]  M. Viberg,et al.  Two decades of array signal processing research: the parametric approach , 1996, IEEE Signal Process. Mag..

[2]  Xiaolong Li,et al.  Speedup convergence and reduce noise for enhanced speech separation and recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Satoshi Nakamura,et al.  An evaluation of sound source identification with RWCP sound scene database in real acoustic environments , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[4]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[5]  Henry Cox,et al.  Practical supergain , 1986, IEEE Trans. Acoust. Speech Signal Process..

[6]  I. McCowan,et al.  The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): specification and initial experiments , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[7]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .