A comparison of molecular approaches for generating sparse and structured multiresolution representations of audio and music signals

The authors investigate the characteristics and performance of joint (single‐step) and sequential (two‐step) approaches to creating sparse and structured multiresolution representations of audio and music signals derived using sparse overcomplete methods. A joint approach, such as molecular matching pursuit, attempts to find structures in a signal as part of the decomposition process, while a sequential approach, such as agglomerative clustering, attempts to find structures in the completed decomposition of a signal. Each of these approaches have different benefits and drawbacks. For a joint approach, it is computationally convenient that the decomposition and structuring are done simultaneously, but usually only simple structural relations are possible. For a sequential approach, one is working in a parameter space of much smaller dimension than the original signal, but the computation is higher since the decomposition and the structure building are two separate processes. Results from these approaches using real audio and music signals will be compared and contrasted, and will contribute to our goal of creating an enhanced interface between the content of audio and music signals, e.g., onsets, notes, voices, and their multiresolution sparse atomic decompositions.

[1]  Laurent Daudet,et al.  Sparse and structured decompositions of signals with the molecular matching pursuit , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[3]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[4]  Bob L. Sturm,et al.  Agglomerative clustering in sparse atomic decompositions of audio signals , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Bob L. Sturm,et al.  Dark Energy in Sparse Atomic Estimations , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  S. Mallat A wavelet tour of signal processing , 1998 .

[7]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[8]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  Emmanuel Vincent,et al.  Instrument-Specific Harmonic Atoms for Mid-Level Music Representation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[11]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[12]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[13]  Marina Bosi,et al.  Introduction to Digital Audio Coding and Standards , 2004, J. Electronic Imaging.