Audio Classification Using Class-Specific Learned Descriptors

This paper presents a classification scheme for audio signals using high-level feature descriptors. The descriptor is designed to capture the relevance of each acoustic feature group (or feature set like mel-frequency cepstral coefficients, perceptual features etc.) in recognizing an audio class. For this, a bank of RVM classifiers are modeled for each ‘audio class’-‘feature group’ pair. The response of an input signal to this bank of RVM classifiers forms the entries of the descriptor. Each entry of the descriptor thus measures the proximity of the input signal to an audio class based on a single feature group. This form of signal representation offers two-fold advantages. First, it helps to determine the effectiveness of each feature group in classifying a specific audio class. Second, the descriptor offers higher discriminability than the low-level feature groups and a simple SVM classifier trained on the descriptor produces better performance than several state-of-the-art methods.

[1]  Gernot A. Fink,et al.  A Bag-of-Features approach to acoustic event detection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Andrey Temko,et al.  CLEAR Evaluation of Acoustic Event Detection and Classification Systems , 2006, CLEAR.

[3]  Huy Phan,et al.  Representing nonspeech audio signals through speech classification models , 2015, INTERSPEECH.

[4]  Kai Oliver Arras,et al.  Audio-Based Human Activity Recognition with Robots , 2011, ICSR 2011.

[5]  Taras Butko,et al.  Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities , 2011, EURASIP J. Adv. Signal Process..

[6]  Tushar Sandhan,et al.  Audio Bank: A high-level acoustic signal representation for audio event recognition , 2014, 2014 14th International Conference on Control, Automation and Systems (ICCAS 2014).

[7]  Maurizio Omologo,et al.  Acoustic event classification using a distributed microphone network with a GMM/SVM combined algorithm , 2008, INTERSPEECH.

[8]  Ben Pinkowski LPC spectral moments for clustering acoustic transients , 1993, IEEE Trans. Speech Audio Process..

[9]  Huy Phan,et al.  Learning compact structural representations for audio events using regressor banks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Mohamed A. Aly,et al.  Novel Methods for the Feature Subset Ensemble Approach , 2006 .

[11]  Jhing-Fa Wang,et al.  Content-Based Audio Classification Using Support Vector Machines and Independent Component Analysis , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  D.V. Anderson,et al.  Independent component analysis for audio classification , 2004, 3rd IEEE Signal Processing Education Workshop. 2004 IEEE 11th Digital Signal Processing Workshop, 2004..

[13]  Murat Akbacak,et al.  Softening quantization in bag-of-audio-words , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Mark Hasegawa-Johnson,et al.  How to put it into words - using random forests to extract symbol level descriptions from audio content for concept detection , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.