Audio Source Separation by Time-Frequency Masking using a Signal-Adaptive Local Cosine Transform

Audio source separation of instantaneous, two-channel mixtures by time-frequency masking depends on (approximately) disjoint representations of the sources in some transform domain. We investigate the application of cosine packet (CP) trees to perform this transform. A computationally efficient best basis algorithm is applied to trees of local cosine bases to determine an appropriate transform. We concentrate on demixing the sources by binary masking, and assume the mixing parameters are known. We develop a heuristically motivated cost function which maximises the energy of the transform coefficients associated with a particular source. Finally, we evaluate our proposed transform method by comparing it against more well-known transforms such as the short-time Fourier transform and modified discrete cosine transform. It is shown that in some circumstances, our method of adaptively selecting local cosine bases can give better results than fixed-basis representations. ∗ Corresponding author. Email addresses: andrew.nesbit@elec.qmul.ac.uk (Andrew Nesbit), mark.plumbley@elec.qmul.ac.uk (Mark D. Plumbley), mike.davies@ed.ac.uk (Mike E. Davies). 1 Supported by the Department of Electronic Engineering, Queen Mary, University of London, and the Semantic Interaction with Music Audio

[1]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Fabian J. Theis,et al.  Sparse component analysis and blind source separation of underdetermined mixtures , 2005, IEEE Transactions on Neural Networks.

[3]  DeLiang Wang,et al.  Auditory-based algorithms for sound segregation in multisource and reverberant environments , 2005 .

[4]  C. Févotte,et al.  A STUDY OF THE EFFECT OF SOURCE SPARSITY FOR VARIOUS TRANSFORMS ON BLIND AUDIO SOURCE SEPARATION PERFORMANCE , 2005 .

[5]  Barak A. Pearlmutter,et al.  Survey of sparse and non‐sparse methods in source separation , 2005, Int. J. Imaging Syst. Technol..

[6]  Vincent Yan,et al.  Blind Audio Source Separation , 2005 .

[7]  Dan Barry,et al.  Real-time Sound Source Separation: Azimuth Discrimination and Resynthesis , 2004 .

[8]  N. Mitianoudis,et al.  Simple mixture model for sparse overcomplete ICA , 2004 .

[9]  Remi Gribonval Piecewise linear source separation , 2003, SPIE Optics + Photonics.

[10]  Parham Aarabi,et al.  Robust speech separation using time-frequency masking , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[11]  Albert S. Bregman,et al.  Auditory Scene Analysis , 2001 .

[12]  Barak A. Pearlmutter,et al.  Multiresolution framework for blind source separation , 2000 .

[13]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[14]  S. Mallat A wavelet tour of signal processing , 1998 .

[15]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[16]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[17]  Henrique S. Malvar,et al.  Signal processing with lapped transforms , 1992 .

[18]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[19]  John Princen,et al.  Subband/Transform coding using filter bank designs based on time domain aliasing cancellation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  John Princen,et al.  Analysis/Synthesis filter bank design based on time domain aliasing cancellation , 1986, IEEE Trans. Acoust. Speech Signal Process..