Sparse Representations in Audio and Music: From Coding to Source Separation

Sparse representations have proved a powerful tool in the analysis and processing of audio signals and already lie at the heart of popular coding standards such as MP3 and Dolby AAC. In this paper we give an overview of a number of current and emerging applications of sparse representations in areas from audio coding, audio enhancement and music transcription to blind source separation solutions that can solve the ¿cocktail party problem.¿ In each case we will show how the prior assumption that the audio signals are approximately sparse in some time-frequency representation allows us to address the associated signal processing task.

[1]  Michael Elad,et al.  Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization , 2007 .

[2]  Michael Zibulevsky,et al.  Underdetermined blind source separation using sparse representations , 2001, Signal Process..

[3]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[4]  Pascal Frossard,et al.  Special issue on image and video coding beyond standards , 2002, Signal Process..

[5]  Bhiksha Raj,et al.  Sparse Overcomplete Decomposition for Single Channel Speaker Separation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Emmanuel Vincent,et al.  Low Bit-Rate Object Coding of Musical Audio Using Bayesian Harmonic Models , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Mark D. Plumbley,et al.  Sparse representations of polyphonic music , 2006, Signal Process..

[8]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[9]  A. Bruckstein,et al.  On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them , 2006 .

[10]  Gaël Richard,et al.  Audio Signal Representations for Indexing in the Transform Domain , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[12]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[13]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[14]  P. L. Combettes,et al.  A proximal decomposition method for solving convex variational inverse problems , 2008, 0807.2617.

[15]  Rémi Gribonval,et al.  A survey of Sparse Component Analysis for blind source separation: principles, perspectives, and new challenges , 2006, ESANN.

[16]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[17]  Robert D. Nowak,et al.  Majorization–Minimization Algorithms for Wavelet-Based Image Restoration , 2007, IEEE Transactions on Image Processing.

[18]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[19]  Sacha Krstulovic,et al.  Mptk: Matching Pursuit Made Tractable , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[21]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[22]  Henrique S. Malvar A modulated complex lapped transform and its applications to audio processing , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[23]  Remi Gribonval Piecewise linear source separation , 2003, SPIE Optics + Photonics.

[24]  Rémi Gribonval,et al.  Audio source separation with a single sensor , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[26]  Perfecto Herrera,et al.  Transmitting Audio Content as Sound Objects , 2002 .

[27]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[28]  Rémi Gribonval,et al.  Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Wotao Yin,et al.  Bregman Iterative Algorithms for (cid:2) 1 -Minimization with Applications to Compressed Sensing ∗ , 2008 .

[30]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[31]  Mark D. Plumbley,et al.  Unsupervised analysis of polyphonic music by sparse coding , 2006, IEEE Transactions on Neural Networks.

[32]  Fabian J. Theis,et al.  Sparse component analysis and blind source separation of underdetermined mixtures , 2005, IEEE Transactions on Neural Networks.

[33]  Yannick Deville,et al.  Blind separation of dependent sources using the "time-frequency ratio of mixtures" approach , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[34]  Laurent Daudet,et al.  OBJECT CODING OF HARMONIC SOUNDS USING SPARSE AND STRUCTURE D REPRESENTATIONS , 2007 .

[35]  Emmanuel Vincent,et al.  Complex Nonconvex l p Norm Minimization for Underdetermined Source Separation , 2007, ICA.

[36]  Mike E. Davies,et al.  Compressed Sensing and Source Separation , 2007, ICA.

[37]  Barak A. Pearlmutter,et al.  Survey of sparse and non‐sparse methods in source separation , 2005, Int. J. Imaging Syst. Technol..

[38]  Ronald R. Coifman,et al.  Multilayered image representation: application to image compression , 2002, IEEE Trans. Image Process..

[39]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[40]  Deanna Needell,et al.  Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit , 2007, Found. Comput. Math..

[41]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[42]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[43]  Gaël Richard,et al.  Union of MDCT Bases for Audio Coding , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  Bruno Torrésani,et al.  Hybrid representations for audiophonic signal encoding , 2002, Signal Process..

[45]  D. Donoho,et al.  Redundant Multiscale Transforms and Their Application for Morphological Component Separation , 2004 .

[46]  Rémi Gribonval,et al.  A Robust Method to Count and Locate Audio Sources in a Stereophonic Linear Instantaneous Mixture , 2006, ICA.

[47]  Simon J. Godsill,et al.  Sparse Linear Regression With Structured Priors and Application to Denoising of Musical Audio , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[48]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.