Kernel discriminant analysis for environmental sound recognition based on acoustic subspace

In this paper, we propose an effective discriminant subspace learning framework to recognize the environmental sounds. Firstly, Gabor transform is adopted to characterize the time-frequency distributions of environmental sounds. We further encode the prominent time-frequency patterns with low rank representation by extracting the subspace from Gabor spectrogram. Unlike conventional sound recognition schemes that are mostly based on acoustic feature vectors, we treat the acoustic subspaces (matrixes) as basic elements for recognition, retaining rich temporal-spectral contextual information. At recognition stage, we employ kernel Fisher discriminant analysis to effectively exploit the class conditional distributions of environmental sounds which are favorable for performing multi-class classification. With a well developed kernel function, the proposed approach achieved superior recognition performance on RWCP sound scene database, compared with the existing methods.

[1]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Nikos Fakotakis,et al.  Exploiting Temporal Feature Integration for Generalized Sound Recognition , 2009, EURASIP J. Adv. Signal Process..

[3]  Mohan S. Kankanhalli,et al.  Audio Based Event Detection for Multimedia Surveillance , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Takumi Kobayashi Generalized Mutual Subspace Based Methods for Image Set Classification , 2012, ACCV.

[5]  Anil K. Jain,et al.  Handbook of Face Recognition, 2nd Edition , 2011 .

[6]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[7]  Thomas S. Huang,et al.  Real-world acoustic event detection , 2010, Pattern Recognit. Lett..

[8]  Gene H. Golub,et al.  Numerical methods for computing angles between linear subspaces , 1971, Milestones in Matrix Computation.

[9]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[10]  Tetsuya Ogata,et al.  Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition , 2010, INTERSPEECH.

[11]  Andrey Temko,et al.  Acoustic event detection in meeting-room environments , 2009, Pattern Recognit. Lett..

[12]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[13]  Daniel P. W. Ellis,et al.  Spectral vs. spectro-temporal features for acoustic event detection , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).