Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions

In this letter, we present a novel feature extraction method for sound event classification, based on the visual signature extracted from the sound's time-frequency representation. The motivation stems from the fact that spectrograms form recognisable images, that can be identified by a human reader, with perception enhanced by pseudo-coloration of the image. The signal processing in our method is as follows. 1) The spectrogram is normalised into greyscale with a fixed range. 2) The dynamic range is quantized into regions, each of which is then mapped to form a monochrome image. 3) The monochrome images are partitioned into blocks, and the distribution statistics in each block are extracted to form the feature. The robustness of the proposed method comes from the fact that the noise is normally more diffuse than the signal and therefore the effect of the noise is limited to a particular quantization region, leaving the other regions less changed. The method is tested on a database of 60 sound classes containing a mixture of collision, action and characteristic sounds and shows a significant improvement over other methods in mismatched conditions, without the need for noise reduction.

[1]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[2]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[3]  Remco C. Veltkamp,et al.  A Survey of Music Information Retrieval Systems , 2005, ISMIR.

[4]  Jean-Jacques E. Slotine,et al.  Audio classification from time-frequency texture , 2008, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Augusto Sarti,et al.  Scream and gunshot detection in noisy environments , 2007, 2007 15th European Signal Processing Conference.

[6]  Glenn Fung,et al.  Proximal support vector machine classifiers , 2001, KDD '01.

[7]  L.-H. Chen,et al.  Colour image retrieval based on primitives of colour moments , 2002 .