Single-channel speech separation using a sparse periodic decomposition

In this paper, we propose a single-channel speech separation method by using a sparse decomposition with a periodic signal model. In our separation method, a mixture of speeches is approximated with periodic signals with time-varying amplitude. The decomposition with the periodic signal model is performed under a sparsity penalty. Due to the sparsity penalty, a segment of the speech mixture is decomposed into periodic signals, each of them is a component of the individual speaker. For speech separation, we introduce the clustering using a K-means algorithm for the set of the periodic signals. After the clustering, each cluster is assigned to its corresponding speaker using codebooks that contain spectral features of the speakers. In experiments, comparison with MaxVQ that performs separation on frequency spectrum domain is demonstrated. The experimental results in terms of signal-to-distortion ratio (SDR) show that our method outperforms MaxVQ with less computational cost for assignment of speech components.

[1]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[2]  Mark D. Plumbley,et al.  Sparse representations of polyphonic music , 2006, Signal Process..

[3]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[4]  Michael A. Casey,et al.  Separation of Mixed Audio Sources By Independent Subspace Analysis , 2000, ICMC.

[5]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.

[6]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Makoto Nakashizuka A Sparse Decomposition Method for Periodic Signal Mixtures , 2008, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[8]  Youji Iiguni,et al.  A sparse periodic decomposition and its application to speech representation , 2008, 2008 16th European Signal Processing Conference.

[9]  Bhiksha Raj,et al.  Soft Mask Methods for Single-Channel Speaker Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Bruno A. Olshausen,et al.  PROBABILISTIC FRAMEWORK FOR THE ADAPTATION AND COMPARISON OF IMAGE CODES , 1999 .

[11]  Mike E. Davies,et al.  Sparse and shift-Invariant representations of music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.