DOMESTIC AUDIO TAGGING WITH CONVOLUTIONAL NEURAL NETWORKS

In this paper, the method used in our submission for DCASE2016 challenge task 4 (domestic audio tagging) is described. The use of convolutional neural networks (CNN) to label the audio signals recorded in a domestic (home) environment is investigated. A relative 23.8% improvement over the Gaussian mixture model (GMM) baseline method is observed over the development dataset for the challenge.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Jon Barker,et al.  Chime-home: A dataset for sound source recognition in a domestic environment , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[3]  Stephan Gerlach,et al.  Acoustic Monitoring and Localization for Social Care , 2012, J. Comput. Sci. Eng..

[4]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Chaitali Chakrabarti,et al.  Lifelogging: Archival and retrieval of continuously recorded audio using wearable devices , 2012, 2012 IEEE International Conference on Emerging Signal Processing Applications.