论文信息 - Robust variational speech separation using fewer microphones than speakers

Robust variational speech separation using fewer microphones than speakers

A variational inference algorithm for robust speech separation, capable of recovering the underlying speech sources even in the case of more sources than microphone observations, is presented. The algorithm is based upon a generative probabilistic model that fuses time-delay of arrival (TDOA) information with prior information about the speakers and application, to produce an optimal estimate of the underlying speech sources. Simulation results are presented for the case of two, three and four underlying sources and two microphone observations corrupted by noise. The resulting SNR gains (32 dB with two sources, 23 dB with three sources, and 16 dB with four sources) are significantly higher than previous speech separation techniques.

Brendan J. Frey | Parham Aarabi | Trausti T. Kristjansson | Kannan Achan | Steven J. Rennie

[1] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[2] Brendan J. Frey,et al. Learning Dynamic Noise Models from Noisy Speech for Robust Speech Recognition , 2001 .

[3] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[4] Guangji Shi,et al. Multi-channel time-frequency data fusion , 2002, Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002. (IEEE Cat.No.02EX5997).

[5] Li Deng,et al. Speech Denoising and Dereverberation Using Probabilistic Models , 2000, NIPS.

[6] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[7] Brendan J. Frey,et al. Variational Learning in Nonlinear Gaussian Belief Networks , 1999, Neural Computation.