Efficient model-based speech separation and denoising using non-negative subspace analysis

We present a new probabilistic architecture for analyzing composite non-negative data, called Non-negative Subspace Analysis (NSA). The NSA model provides a framework for understanding the relationships between sparse subspace and mixture model based approaches, and encompasses a range of models, including Sparse Non-negative Matrix Factorization (SNMF) [1] and mixture-model based analysis as special cases. We present a convenient instantiation of the NSA model, and an efficient variational approximate learning and inference algorithm that combines the advantages of SNMF and mixture model-based approaches. Preliminary recognition results on the Pascal Speech Separation Challenge 2006 test set [2], based on NSA separation results, are presented. The results fall short of those achieved by Algonquin [3], a state-of-the-art mixture-model based method, but considering that NSA runs an order of magnitude faster, the results are impressive. NSA outperforms SNMF in terms of word error rate (WER) on the task by a significant margin of over 9% absolute.

[1]  B. Frey,et al.  Probabilistic Sparse Matrix Factorization , 2004 .

[2]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[3]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Brendan J. Frey,et al.  Learning Dynamic Noise Models from Noisy Speech for Robust Speech Recognition , 2001 .

[5]  John R. Hershey,et al.  Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system , 2006, INTERSPEECH.

[6]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[7]  Brendan J. Frey,et al.  Multi-way clustering of microarray data using probabilistic sparse matrix factorization , 2005, ISMB.

[8]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[9]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.

[10]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[11]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[12]  Vladimir I. Levenshtein,et al.  Efficient reconstruction of sequences , 2001, IEEE Trans. Inf. Theory.