Sound Event Localization and Detection Using CRNN on Pairs of Microphones

This paper proposes sound event localization and detection methods from multichannel recording. The proposed system is based on two Convolutional Recurrent Neural Networks (CRNNs) to perform sound event detection (SED) and time difference of arrival (TDOA) estimation on each pair of microphones in a microphone array. In this paper, the system is evaluated with a four-microphone array, and thus combines the results from six pairs of microphones to provide a final classification and a 3-D direction of arrival (DOA) estimate. Results demonstrate that the proposed approach outperforms the DCASE 2019 baseline system.

[1]  Jean Rouat,et al.  Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering , 2007, Robotics Auton. Syst..

[2]  Toni Heittola,et al.  IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events SOUND EVENT DETECTION FOR OFFICE LIVE AND OFFICE SYNTHETIC AASP CHALLENGE , 2015 .

[3]  Reishi Kondo,et al.  Acoustic Event Detection Method Using Semi-Supervised Non-Negative Matrix Factorization with Mixtures of Local Dictionaries , 2016, DCASE.

[4]  Jean Rouat,et al.  Localization of simultaneous moving sound sources for mobile robot using a frequency- domain steered beamformer approach , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[5]  Jean Rouat,et al.  Robust 3D Localization and Tracking of Sound Sources Using Beamforming and Particle Filtering , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Mark D. Plumbley,et al.  Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy , 2019, DCASE.

[7]  Walter Kellermann,et al.  EB-ESPRIT: 2D localization of multiple wideband acoustic sources using eigen-beams , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Daniel P. W. Ellis,et al.  Spectral vs. spectro-temporal features for acoustic event detection , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[9]  Heikki Huttunen,et al.  Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Enrico Del Re,et al.  A real-time siren detector to improve safety of guide in traffic environment , 2008, 2008 16th European Signal Processing Conference.

[11]  Hervé Glotin,et al.  Bird detection in audio: A survey and a challenge , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[12]  Tuomas Virtanen,et al.  Context-dependent sound event detection , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[13]  Ankit Shah,et al.  DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[14]  Archontis Politis,et al.  Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network , 2017, 2018 26th European Signal Processing Conference (EUSIPCO).

[15]  Hiroshi G. Okuno,et al.  A real-time super-resolution robot audition system that improves the robustness of simultaneous speech recognition , 2013, Adv. Robotics.

[16]  P. W. J. van Hengel,et al.  Audio Event Detection for In-Home Care , 2009 .

[17]  Patrick Danès,et al.  Broadband variations of the MUSIC high-resolution method for Sound Source Localization in Robotics , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Andrzej Czyzewski,et al.  Detection and localization of selected acoustic events in acoustic field for smart surveillance applications , 2012, Multimedia Tools and Applications.

[19]  Mathieu Lagrange,et al.  Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Shiqiang Wang,et al.  DOMESTIC ACTIVITIES CLASSIFICATION BASED ON CNN USING SHUFFLING AND MIXING DATA AUGMENTATION Technical Report , 2018 .

[21]  Archontis Politis,et al.  A multi-room reverberant dataset for sound event localization and detection , 2019, DCASE.

[22]  Onur Dikmen,et al.  Sound event detection using non-negative dictionaries learned from annotated overlapping events , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[23]  Thomas Kailath,et al.  ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[24]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[25]  Patrick Danès,et al.  Information-theoretic detection of broadband sources in a coherent beamspace MUSIC scheme , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  T. Kailath,et al.  Estimation of Signal Parameters via Rotational Invariance Techniques - ESPRIT , 1986 .

[27]  François Michaud,et al.  Lightweight and Optimized Sound Source Localization and Tracking Methods for Open and Closed Microphone Array Configurations , 2018, Robotics Auton. Syst..

[28]  Yong Xu,et al.  Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  François Michaud,et al.  The ManyEars open framework , 2013, Autonomous Robots.

[30]  Annamaria Mesaros,et al.  Metrics for Polyphonic Sound Event Detection , 2016 .

[31]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[32]  Archontis Politis,et al.  Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks , 2018, IEEE Journal of Selected Topics in Signal Processing.

[33]  Hiroshi Ishiguro,et al.  Evaluation of a MUSIC-based real-time sound localization of multiple sound sources in real noisy environments , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Lu Jiakai,et al.  MEAN TEACHER CONVOLUTION SYSTEM FOR DCASE 2018 TASK 4 , 2018 .