Acoustic Event Detection Using Fuzzy Integral Ensemble and Oriented Fuzzy Local Binary Pattern Encoded CNN

In this paper, we propose a novel ensemble classifier using an Oriented Fuzzy Local Binary Pattern Encoded Convolutional Neural Network (CNN) for acoustic event detection (AED). The CNN has been widely used to perform acoustic event detection using a spectrogram image of the acoustic signals. The efficiency of the CNN depends on representation of the spectrogram images used during the training process. We propose the Oriented Fuzzy Local Binary Pattern (OFLBP) that extracts directional texture features from the spectrogram image by inspecting neighborhood pixels present at different angles from a central pixel. The proposed OFLBP technique is capable to deal with uncertainty present in the spectrogram image. The ensemble of the trained CNN is performed by a Fuzzy Integral method. The experiment and results show the proposed method outperforms to existing AED methods to classify the ESC-50 dataset.

[1]  Rashmi Dutta Baruah,et al.  Acoustic event classification using Cauchy Non-negative matrix factorization and fuzzy rule-based classifier , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[2]  Thaweesak Yingthawornsuk,et al.  Speech Recognition using MFCC , 2012 .

[3]  Rashmi Dutta Baruah,et al.  Incremental Cauchy Non-Negative Matrix Factorization and Fuzzy Rule-based Classifier for Acoustic Source Separation , 2019, 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[4]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[5]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Keisuke Nakamura,et al.  Speech-based human-robot interaction robust to acoustic reflections in real environment , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Sridhar Krishnan,et al.  Combining Temporal Features by Local Binary Pattern for Acoustic Scene Classification , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Karol J. Piczak ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.

[9]  Christian Fuegen,et al.  Facebook Acoustic Events Dataset , 2018 .

[10]  Ilyas Ozer,et al.  Noise robust sound event classification with convolutional neural network , 2018, Neurocomputing.

[11]  Takumi Kobayashi,et al.  Acoustic feature extraction by statistics based local binary pattern for environmental sound classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Yan Song,et al.  Robust sound event recognition using convolutional neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[14]  Abeer Alwan,et al.  Attention Based CLDNNs for Short-Duration Acoustic Scene Classification , 2017, INTERSPEECH.

[15]  Nicholas W. D. Evans,et al.  Acoustic context recognition using local binary pattern codebooks , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[16]  Hemant A. Patil,et al.  Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification , 2017, INTERSPEECH.

[17]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[18]  Luc Van Gool,et al.  Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection , 2016 .

[19]  Anirban Bhowmick,et al.  Speech enhancement using voiced speech probability based wavelet decomposition , 2017, Comput. Electr. Eng..

[20]  Mayur R Gamit ISOLATED WORDS RECOGNITION USING MFCC, LPC AND NEURAL NETWORK , 2015 .

[21]  Gregory T. Adams,et al.  The fuzzy integral , 1980 .

[22]  Kyogu Lee,et al.  Ensemble of Convolutional Neural Networks for Weakly-supervised Sound Event Detection Using Multiple Scale Input , 2017, DCASE.

[23]  Mohan S. Kankanhalli,et al.  Audio Based Event Detection for Multimedia Surveillance , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[24]  Florian Metze,et al.  Audio-based multimedia event detection using deep recurrent neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Musaed Alhussein,et al.  Automatic Scene Recognition through Acoustic Classification for Behavioral Robotics , 2019, Electronics.

[26]  Soo-Don Hyun,et al.  ACOUSTIC SCENE CLASSIFICATION USING PARALLEL COMBINATION OF LSTM AND CNN , 2016 .

[27]  Andrey Temko,et al.  Acoustic event detection in meeting-room environments , 2009, Pattern Recognit. Lett..

[28]  Huy Phan,et al.  Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks , 2016, INTERSPEECH.

[29]  B. K. Julsing,et al.  Face Recognition with Local Binary Patterns , 2012 .

[30]  Tatsuya Harada,et al.  Learning from Between-class Examples for Deep Sound Recognition , 2017, ICLR.

[31]  Andrew P. Witkin,et al.  Analyzing Oriented Patterns , 1985, IJCAI.

[32]  Suyash P. Awate,et al.  Computer Vision, Graphics, and Image Processing , 2016, Lecture Notes in Computer Science.

[33]  Yoon Keun Kwak,et al.  Speech Emotion Recognition Using Eigen-FFT in Clean and Noisy Environments , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[34]  Yan Li,et al.  FUSION OF EXTREME LEARNING MACHINE WITH FUZZY INTEGRAL , 2013 .

[35]  Dimitrios K. Iakovidis,et al.  Fuzzy Local Binary Patterns for Ultrasound Texture Characterization , 2008, ICIAR.

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Florian Metze,et al.  Robust audio-codebooks for large-scale event detection in consumer videos , 2013, INTERSPEECH.

[38]  Thomas S. Huang,et al.  Real-world acoustic event detection , 2010, Pattern Recognit. Lett..

[39]  Anurag Kumar,et al.  Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Kyogu Lee,et al.  Rare Sound Event Detection Using 1D Convolutional Recurrent Neural Networks , 2017, DCASE.