Environmental sound classification based on time-frequency representation

This paper proposes a feature extraction method for environmental sound event classification based on time-frequency representation such as spectrogram. There are three portions to perform environmental classification. Firstly, the input signal is converted into spectrogram image with time-frequency representation using short time Fourier transforms. Secondly, this spectrogram is used to extract features with local binary pattern of three different radius and neighborhood sizes. The three distinct features resulted from local binary pattern based on spectrogram are concatenated and used as one feature vector. Finally, multi support vector machine is used for classification of environmental sound event. Evaluation is tested on ESC-10 dataset.

[1]  Paul F. Whelan,et al.  Local binary patterns versus signal processing texture analysis: a study from a performance evaluati , 2012 .

[2]  Haizhou Li,et al.  Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.

[3]  Zhenhua Guo,et al.  A Completed Modeling of Local Binary Pattern Operator for Texture Classification , 2010, IEEE Transactions on Image Processing.

[4]  Jean-Jacques E. Slotine,et al.  Audio classification from time-frequency texture , 2008, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Khine Zar Thwe Sound event classification using bidirectional local binary pattern , 2017, 2017 International Conference on Signal Processing and Communication (ICSPC).

[6]  Takumi Kobayashi,et al.  Acoustic feature extraction by statistics based local binary pattern for environmental sound classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Matti Pietikäinen,et al.  Gray Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2000, ECCV.

[8]  Chng Eng Siong,et al.  Analysis of spectrogram image methods for sound event classification , 2014, INTERSPEECH.

[9]  H. B. Kekre,et al.  Speaker Identification using Spectrograms of Varying Frame Sizes , 2012 .

[10]  Matti Pietikäinen,et al.  Median Robust Extended Local Binary Pattern for Texture Classification , 2016, IEEE Transactions on Image Processing.

[11]  Myung Jong Kim,et al.  Audio-Based Objectionable Content Detection Using Discriminative Transforms of Time-Frequency Dynamics , 2012, IEEE Transactions on Multimedia.

[12]  Nabil Alshurafa,et al.  Spectrogram-based audio classification of nutrition intake , 2014, 2014 IEEE Healthcare Innovation Conference (HIC).

[13]  The Duy Bui,et al.  Speech classification using SIFT features on spectrogram images , 2016, Vietnam Journal of Computer Science.

[14]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Zied Lachiri,et al.  Environmental Sounds Spectrogram Classification using Log-Gabor Filters and Multiclass Support Vector Machines , 2012, ArXiv.

[16]  Björn W. Schuller,et al.  Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Jun Huang,et al.  An improved local binary pattern operator for texture classification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).