Model-based non-negative matrix factorization for single-channel speech separation

A model-based non-negative matrix factorization (NMF) algorithm is formulated for single-channel speech source separation. With linguistic priors of the speech sources, the state-aligned spectral envelopes of these sources are inferred from acoustic models. Being initialized with the computed spectral envelopes, NMF processing is implemented to estimate the spectral envelope trajectory of each source. Subsequently the source speech is generated by reshaping the mixture spectra according to the estimated spectral envelopes, followed by the reduction of interfering harmonic components. An iterative process is developed for more reliable time alignment and hence better performance of separation. Experimental results show that two speech sources with equal intensity can be successfully separated by the proposed algorithm.

[1]  DeLiang Wang,et al.  An Auditory Scene Analysis Approach to Monaural Speech Segregation , 2006 .

[2]  Li Deng,et al.  Nonstationary-state hidden Markov model representation of speech signals for speech enhancement , 2002, Signal Process..

[3]  Frank K. Soong,et al.  Model-based speech separation: identifying transcription using orthogonality , 2009, INTERSPEECH.

[4]  Danny Crookes,et al.  A Corpus-Based Approach to Speech Enhancement From Nonstationary Noise , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[6]  Biing-Hwang Juang,et al.  Line spectrum pair (LSP) and speech data compression , 1984, ICASSP.

[7]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Tan Lee,et al.  Spoken language resources for Cantonese speech processing , 2002, Speech Commun..

[9]  Jen-Tzung Chien,et al.  A new independent component analysis for speech recognition and separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Frank K. Soong,et al.  Model-based speech separation with single-microphone input , 2007, INTERSPEECH.

[11]  Robert M. Nickel,et al.  Inventory based speech enhancement for speaker dedicated speech communication systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..