Audio event classification using deep neural networks

We present in this paper our work on audio event classification for outdoor events. As the main classification method we employ a deep neural network (DNN) and compare this to other classification methods. We propose a novel improvement to the pre-training process of the network which is useful when training with Gaussian data. Our experimental results are based on an audio corpus extracted from the FreeSound.org website repository. We show that the DNN has some advantage over other classification methods and that fusion of two methods can produce the best results.

[1]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[2]  Tara N. Sainath,et al.  Making Deep Belief Networks effective for large vocabulary continuous speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  P. Dhanalakshmi,et al.  Classification of audio signals using SVM and RBFNN , 2009, Expert Syst. Appl..

[4]  Alberto Del Bimbo,et al.  Deep networks for audio event classification in soccer videos , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[5]  Björn W. Schuller,et al.  Semi-supervised learning helps in sound event classification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[8]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[9]  Juan José Burred,et al.  Genetic motif discovery applied to audio analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Yonghong Yan,et al.  A SVM-Based Audio Event Detection System , 2010, 2010 International Conference on Electrical and Control Engineering.

[11]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[12]  Bhiksha Raj,et al.  Audio event detection from acoustic unit occurrence patterns , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[14]  Jörn Anemüller,et al.  Automatic classification of audio data using nonlinear neural response models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Hagai Aronowitz,et al.  Segmental Modeling for Audio Segmentation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Tapani Raiko,et al.  Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines , 2011, ICANN.