Sound-event classification using pseudo-color CENTRIST feature and classifier selection

Sound-event classification often extracts features from an image-like spectrogram. Recent approaches such as spectrogram image feature and subband-power-distribution image feature extract local statistics such as mean and variance from the spectrogram. We argue that such simple image statistics cannot well capture complex texture details of the spectrogram. Thus, we propose to extract pseudo-color CENTRIST features from the logarithm of Gammatone-like spectrogram. To well classify the sound event under the unknown noise condition, we propose a classifier-selection scheme, which automatically selects the most suitable classifier. The proposed approach is compared with the state of the art on the RWCP database, and demonstrates a superior performance.

[1]  Gang Wang,et al.  Optimizing LBP Structure For Visual Recognition Using Binary Quadratic Programming , 2014, IEEE Signal Processing Letters.

[2]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3]  Xudong Jiang,et al.  Learning LBP structure by maximizing the conditional mutual information , 2015, Pattern Recognit..

[4]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[5]  Xudong Jiang,et al.  Noise-Resistant Local Binary Pattern With an Embedded Error-Correction Mechanism , 2013, IEEE Transactions on Image Processing.

[6]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[7]  Chng Eng Siong,et al.  Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Xudong Jiang,et al.  LBP-Based Edge-Texture Features for Object Recognition , 2014, IEEE Transactions on Image Processing.

[9]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  M. Kleinschmidt Methods for capturing spectro-temporal modulations in automatic speech recognition , 2001 .

[11]  Sridhar Krishnan,et al.  Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Haizhou Li,et al.  Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions , 2011, IEEE Signal Processing Letters.

[13]  Xudong Jiang,et al.  A Chi-Squared-Transformed Subspace of LBP Histogram for Visual Recognition , 2015, IEEE Transactions on Image Processing.

[14]  Augusto Sarti,et al.  Scream and gunshot detection in noisy environments , 2007, 2007 15th European Signal Processing Conference.

[15]  S. Qian,et al.  Joint time-frequency analysis , 1999, IEEE Signal Process. Mag..

[16]  Satoshi Nakamura,et al.  Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition , 2000, LREC.

[17]  Xudong Jiang,et al.  Dynamic texture recognition using enhanced LBP features , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Xudong Jiang,et al.  Quantized fuzzy LBP for face recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Xudong Jiang,et al.  Relaxed local ternary pattern for face recognition , 2013, 2013 IEEE International Conference on Image Processing.

[20]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Xudong Jiang,et al.  LBP Encoding Schemes Jointly Utilizing the Information of Current Bit and Other LBP Bits , 2015, IEEE Signal Processing Letters.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Jianxin Wu,et al.  mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene Categorization , 2014, IEEE Transactions on Image Processing.

[24]  Richard F. Lyon,et al.  Machine Hearing: An Emerging Field [Exploratory DSP] , 2010, IEEE Signal Processing Magazine.

[25]  Xudong Jiang,et al.  Learning binarized pixel-difference pattern for scene recognition , 2013, 2013 IEEE International Conference on Image Processing.