Sparse Approximations for Drum Sound Classification

Up to now, there has only been little work on using features from temporal approximations of signals for audio recognition. Time-frequency tradeoffs are an important issue in signal processing; sparse representations using overcomplete dictionaries may (or may not, depending on the dictionary) have more time-frequency flexibility than standard short-time Fourier transform. Also, the precise temporal structure of signals cannot be captured by spectral-based feature methods. Here, we present a biologically inspired three-step process for audio classification: 1) Efficient atomic functions are learned in an unsupervised manner on mixtures of percussion sounds (drum phrases), optimizing the length as well as the shape of the atoms. 2) An analog spike model is used to sparsely approximate percussion sound signals (bass drum, snare drum, hi-hat). The spike model consists of temporally shifted versions of the learned atomic functions, each having a precise temporal position and amplitude. To obtain the decomposition given a set of atomic functions, matching pursuit is used. 3) Features are extracted from the resulting spike representation of the signal. The classification accuracy of our method using a support vector machine (SVM) in a 3-class database transfer task is 87.8%. Using gammatone functions instead of the learned sparse functions yields an even better classification rate of 97.6%. Testing the features on sounds containing additive white Gaussian noise reveals that sparse approximation features are far more robust to such distortions than our benchmark feature set of timbre descriptor (TD) features.

[1]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[2]  Jouni Paulus,et al.  Drum transcription with non-negative spectrogram factorisation , 2005, 2005 13th European Signal Processing Conference.

[3]  Gang Xu,et al.  Signal enhancement with matching pursuit , 2004, IEEE 60th Vehicular Technology Conference, 2004. VTC2004-Fall. 2004.

[4]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[5]  Pascal Frossard,et al.  Dictionary Learning , 2011, IEEE Signal Processing Magazine.

[6]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[7]  Hendrik Purwins,et al.  Sparse coding for drum sound classification and its use as a similarity measure , 2010, MML '10.

[8]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[9]  Thierry Bertin-Mahieux,et al.  On the Use of Sparce Time Relative Auditory Codes for Music , 2008, ISMIR.

[10]  Bruno A. Olshausen,et al.  PROBABILISTIC FRAMEWORK FOR THE ADAPTATION AND COMPARISON OF IMAGE CODES , 1999 .

[11]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Marco Marchini,et al.  Unsupervised Generation of Percussion Sound Sequences from a Sound Example , 2010 .

[13]  Klaus Obermayer,et al.  Representations and Predictors for Everyday Sounds , 2008 .

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Bob L. Sturm,et al.  Musical instrument identification using multiscale Mel-frequency cepstral coefficients , 2010, 2010 18th European Signal Processing Conference.

[16]  Rémi Gribonval,et al.  An L1 criterion for dictionary learning by subspace identification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[18]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Hossein Najaf-Zadeh,et al.  A Biologically-Inspired Low-Bit-Rate Universal Audio Coder , 2007 .

[20]  Guillermo Sapiro,et al.  Learning to Sense Sparse Signals: Simultaneous Sensing Matrix and Sparsifying Dictionary Optimization , 2009, IEEE Transactions on Image Processing.

[21]  Mark D. Plumbley,et al.  Sparse representations of polyphonic music , 2006, Signal Process..

[22]  Michael S. Lewicki,et al.  Efficient Coding of Time-Relative Structure Using Spikes , 2005, Neural Computation.

[23]  Gaël Richard,et al.  ENST-Drums: an extensive audio-visual database for drum signals processing , 2006, ISMIR.

[24]  Terrence J. Sejnowski,et al.  Coding Time-Varying Signals Using Sparse, Shift-Invariant Representations , 1998, NIPS.

[25]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[26]  Stephen J. Wright,et al.  Computational Methods for Sparse Solution of Linear Inverse Problems , 2010, Proceedings of the IEEE.

[27]  Fabien Gouyon,et al.  Automatic Classification of Drum Sounds: A Comparison of Feature Selection Methods and Classification Techniques , 2002, ICMAI.

[28]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[29]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[30]  Klaus Obermayer,et al.  Visualisation and Measurement Assisted Design , 2009 .

[31]  Masataka Goto,et al.  Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With Harmonic Structure Suppression , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[33]  Bernard De Baets,et al.  AN ALGORITHM FOR DETECTING AND LABELING DRUM EVENTS IN POLYPHONIC MUSIC , 2005 .

[34]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.