Image Representation of the Subband Power Distribution for Robust Sound Classification

This paper proposes a robust sound event classification method, based on a selective image feature driven from the novel subband power distribution (SPD), which represents the distribution of power over frequency components. This method is an extension of our previous work, which was motivated by the visual perception of the spectrogram to produce a robust feature for sound classification. Unlike the conventional spectrogram, the proposed SPD representation is invariant to timeshifting and therefore suitable for real scenarios where the detected sound clips are not always balanced. Furthermore, we develop a missing feature classification method, which automatically selects the sparse, representative areas of the signal from the noisy SPD image of the sound clip. The method is tested on a large database containing 50 sound classes, under four different noise environments, varying from clean to severe noise conditions. A significant improvement in performance was obtained in mismatched conditions, producing an average classification accuracy of 87.5% in the 0dB noise condition.

[1]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field , 2010 .

[2]  Andrey Temko,et al.  Classification of meeting-room acoustic events with support vector machines and variable-feature-set clustering , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[4]  Steve Young,et al.  The HTK book , 1995 .

[5]  Remco C. Veltkamp,et al.  A Survey of Music Information Retrieval Systems , 2005, ISMIR.

[6]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[7]  Wai C. Chu,et al.  Speech Coding Algorithms , 2003 .

[8]  Haizhou Li,et al.  Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.

[9]  R.M. Stern,et al.  Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.

[10]  Satoshi Nakamura,et al.  RWCP Sound Scene Database in Real Acoustic Environment , 2002 .