Acoustic feature extraction by statistics based local binary pattern for environmental sound classification

Classification of environmental sounds is a fundamental procedure for a wide range of real-world applications. In this paper, we propose a novel acoustic feature extraction method for classifying the environmental sounds. The proposed method is motivated from the image processing technique, local binary pattern (LBP), and works on a spectrogram which forms two-dimensional (time-frequency) data like an image. Since the spectrogram contains noisy pixel values, for improving classification performance, it is crucial to extract the features which are robust to the fluctuations in pixel values. We effectively incorporate the local statistics, mean and standard deviation on local pixels, to establish robust LBP. In addition, we provide the technique of L2-Hellinger normalization which is efficiently applied to the proposed features so as to further enhance the discriminative power while increasing the robustness. In the experiments on environmental sound classification using RWCP dataset that contains 105 sound categories, the proposed method produces the superior performance (98.62%) compared to the other methods, exhibiting significant improvements over the standard LBP method as well as robustness to noise and low computation time.

[1]  Nikos Fakotakis,et al.  Exploiting Temporal Feature Integration for Generalized Sound Recognition , 2009, EURASIP J. Adv. Signal Process..

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[4]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Tetsuya Ogata,et al.  Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition , 2010, INTERSPEECH.

[6]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Takumi Kobayashi,et al.  Kernel discriminant analysis for environmental sound recognition based on acoustic subspace , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Yong Liu,et al.  Environmental sound recognition using time-frequency intersection patterns , 2011, iCAST.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[11]  Haizhou Li,et al.  Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.

[12]  Yang Zhang,et al.  Cluster Self-organization of Known and Unknown Environmental Sounds Using Recurrent Neural Network , 2011, ICANN.

[13]  Luiz Eduardo Soares de Oliveira,et al.  Music genre classification using LBP textural features , 2012, Signal Process..

[14]  Inma Hernáez,et al.  Audio Classification Techniques in Home Environments for Elderly/Dependant People , 2010, ICCHP.

[16]  Matti Pietikäinen,et al.  Computer Vision Using Local Binary Patterns , 2011, Computational Imaging and Vision.

[17]  Daniel P. W. Ellis,et al.  Spectral vs. spectro-temporal features for acoustic event detection , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[18]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[19]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.