Extension of Sparse, Adaptive Signal Decompositions to Semi-blind Audio Source Separation

We apply sparse, fast and flexible adaptive lapped orthogonal transforms to underdetermined audio source separation using the time-frequency masking framework. This normally requires the sources to overlap as little as possible in the time-frequency plane. In this work, we apply our adaptive transform schemes to the semi-blind case, in which the mixing system is already known, but the sources are unknown. By assuming that exactly two sources are active at each time-frequency index, we determine both the adaptive transforms and the estimated source coefficients using ***1 norm minimisation. We show average performance of 12---13 dB SDR on speech and music mixtures, and show that the adaptive transform scheme offers improvements in the order of several tenths of a dB over transforms with constant block length. Comparison with previously studied upper bounds suggests that the potential for future improvements is significant.

[1]  Michael Zibulevsky,et al.  Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[2]  Emmanuel Vincent,et al.  Oracle evaluation of flexible adaptive transforms for underdetermined audio source separation , 2008 .

[3]  Emmanuel Vincent,et al.  Blind Criterion and Oracle Bound for Instantaneous Audio Source Separation using Adaptive Time-Frequency Representations , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[4]  Pau Bofill Identifying Single Source Data for Mixing Matrix Estimation in Instantaneous Blind Source Separation , 2008, ICANN.

[5]  Véra Kůrková,et al.  Artificial Neural Networks - ICANN 2008 , 18th International Conference, Prague, Czech Republic, September 3-6, 2008, Proceedings, Part I , 2008, ICANN.

[6]  S. Mallat A wavelet tour of signal processing , 1998 .

[7]  Emmanuel Vincent,et al.  Benchmarking flexible adaptive time-frequency transforms for underdetermined audio source separation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Remi Gribonval Piecewise linear source separation , 2003, SPIE Optics + Photonics.

[9]  Mark D. Plumbley,et al.  Oracle estimation of adaptive cosine packet transforms for underdetermined audio source separation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Charles A. Bouman,et al.  Best basis search in lapped dictionaries , 2006, IEEE Transactions on Signal Processing.

[11]  Rémi Gribonval,et al.  Oracle estimators for the benchmarking of source separation algorithms , 2007, Signal Process..