Label Tree Embeddings for Acoustic Scene Classification

We present in this paper an efficient approach for acoustic scene classification by exploring the structure of class labels. Given a set of class labels, a category taxonomy is automatically learned by collectively optimizing a clustering of the labels into multiple meta-classes in a tree structure. An acoustic scene instance is then embedded into a low-dimensional feature representation which consists of the likelihoods that it belongs to the meta-classes. We demonstrate state-of-the-art results on two different datasets for the acoustic scene classification task, including the DCASE 2013 and LITIS Rouen datasets.

[1]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[3]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field , 2010 .

[4]  Dan Stowell,et al.  Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[5]  Huy Phan,et al.  Learning Representations for Nonspeech Audio Events Through Their Similarities to Speech Patterns , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Tuomas Virtanen,et al.  Audio context recognition using audio event histograms , 2010, 2010 18th European Signal Processing Conference.

[7]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Tuomas Virtanen,et al.  Context-dependent sound event detection , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[9]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[10]  Yangsheng Xu,et al.  Intelligent wearable interfaces , 2007 .

[11]  R. Radhakrishnan,et al.  Audio analysis for surveillance applications , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[12]  Jiqing Han,et al.  Robust minimum statistics project coefficients feature for acoustic environment recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Takumi Kobayashi,et al.  Acoustic Scene Classification based on Sound Textures and Events , 2015, ACM Multimedia.

[14]  Gaël Richard,et al.  Acoustic scene classification with matrix factorization for unsupervised feature learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Aurélien Mayoue,et al.  Deep neural networks for audio scene recognition , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[16]  Mark D. Plumbley,et al.  Acoustic Scene Classification: Classifying environments from the sounds they produce , 2014, IEEE Signal Processing Magazine.

[17]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[18]  VirtanenTuomas,et al.  Detection and Classification of Acoustic Scenes and Events , 2018 .

[19]  Alain Rakotomamonjy,et al.  Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015, ArXiv.

[20]  S. Rigatti Random Forest. , 2017, Journal of insurance medicine.

[21]  Lie Lu,et al.  Co-clustering for Auditory Scene Categorization , 2008, IEEE Transactions on Multimedia.

[22]  Huy Phan,et al.  Representing nonspeech audio signals through speech classification models , 2015, INTERSPEECH.

[23]  Hiroyuki Kasai,et al.  Noise-Robust environmental sound classification method based on combination of ICA and MP features , 2013, Artif. Intell. Res..

[24]  Gaël Richard,et al.  HOG and subband power distribution image features for acoustic scene classification , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[25]  Rainer Martin,et al.  Optimization of amplitude modulation features for low-resource acoustic scene classification , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[26]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Francesc Alías,et al.  Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification , 2012, IEEE Transactions on Multimedia.

[29]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field [Exploratory DSP] , 2010, IEEE Signal Processing Magazine.