Single-channel speech separation by using a sparse decomposition with periodic structure

In this paper, we propose single-channel speech separation by using a sparse decomposition method. First, the model for the periodic signals with time-varying amplitude is introduced to approximate speech signals. The sparse decomposition is proposed with this signal model and a sparsity measure. The sparsity measure is defined as a sum of the l2 norms of the resultant periodic subsignals to find the shortest path to the approximation. By this penalty of the sparsity, the proposed decomposition extracts significant periodic components from a mixture and has ability of the source estimation for mixtures of periodic signals. In experiments, we apply the proposed decomposition to speech mixtures and demonstrate speech separation with codebooks of the speakers. In additionally, comparison with MaxVQ method that performs separation on the frequency spectrum domain is also demonstrated. Comparing with the MaxVQ, our method is less sensitive to the codebook design and requires less computational costs.

[1]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[2]  Makoto Nakashizuka A Sparse Decomposition Method for Periodic Signal Mixtures , 2008, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[3]  Thomas W. Parks,et al.  Orthogonal, exactly periodic subspace decomposition , 2003, IEEE Trans. Signal Process..

[4]  Youji Iiguni,et al.  A sparse periodic decomposition and its application to speech representation , 2008, 2008 16th European Signal Processing Conference.

[5]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[6]  William A. Sethares,et al.  Periodicity transforms , 1999, IEEE Trans. Signal Process..

[7]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Bhiksha Raj,et al.  Soft Mask Methods for Single-Channel Speaker Separation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Bruno A. Olshausen,et al.  PROBABILISTIC FRAMEWORK FOR THE ADAPTATION AND COMPARISON OF IMAGE CODES , 1999 .

[10]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Michael A. Casey,et al.  Separation of Mixed Audio Sources By Independent Subspace Analysis , 2000, ICMC.

[12]  Mark D. Plumbley,et al.  Sparse representations of polyphonic music , 2006, Signal Process..

[13]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[14]  Mike E. Davies,et al.  Sparse and shift-Invariant representations of music , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Sam T. Roweis,et al.  Factorial models and refiltering for speech separation and denoising , 2003, INTERSPEECH.