Histogram of gradients of Time-Frequency Representations for Audio scene detection

Presents our entry to the Detection and Classification of Acoustic Scenes challenge. The approach we propose for classifying acoustic scenes is based on transforming the audio signal into a time-frequency representation and then in extracting relevant features about shapes and evolutions of time-frequency structures. These features are based on histogram of gradients that are subsequently fed to a multi-class linear support vector machines.

[1]  Juhan Nam,et al.  Acoustic scene classification using sparse feature learning and event-based pooling , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[2]  Björn W. Schuller,et al.  Large-scale audio feature extraction and SVM for acoustic scene classification , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[3]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[4]  Waldo Nogueira,et al.  Recurrence quantification analysis features for environmental sound recognition , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[5]  Anssi Klapuri,et al.  Latent semantic analysis in sound event detection , 2011, 2011 19th European Signal Processing Conference.

[6]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[7]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[10]  Mathieu Lagrange,et al.  Characterisation of acoustic scenes using a temporally-constrained shift-invariant model , 2012 .

[11]  Shung-Yung Lung Feature extracted from wavelet decomposition using biorthogonal Riesz basis for text-independent speaker recognition , 2008, Pattern Recognit..

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Gabriele Steidl,et al.  Efficient wavelet adaptation for hybrid wavelet-large margin classifiers , 2005, Pattern Recognit..

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Wei Jiang,et al.  Combining frame and segment based models for environmental sound classification , 2012, INTERSPEECH.

[16]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[17]  Liqiang Ji,et al.  A call-independent and automatic acoustic system for the individual recognition of animals: A novel model using four passerines , 2010, Pattern Recognit..

[18]  Benjamin Cauchi,et al.  Non-Negative Matrix Factorization Applied to Auditory Scenes Classification , 2011 .

[19]  Chng Eng Siong,et al.  Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Daniel P. W. Ellis,et al.  Spectral vs. spectro-temporal features for acoustic event detection , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[21]  François Pachet,et al.  The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. , 2007, The Journal of the Acoustical Society of America.

[22]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[23]  Ben P. Milner,et al.  Context awareness using environmental noise classification , 2003, INTERSPEECH.

[24]  Jorge Stolfi,et al.  T-HOG: An effective gradient-based descriptor for single line text regions , 2013, Pattern Recognit..

[25]  A. Broggi,et al.  Pedestrian Detection using Infrared images and Histograms of Oriented Gradients , 2006, 2006 IEEE Intelligent Vehicles Symposium.

[26]  Jean-Jacques E. Slotine,et al.  Audio classification from time-frequency texture , 2008, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Ben P. Milner,et al.  Acoustic environment classification , 2006, TSLP.

[29]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Dan Stowell,et al.  Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.