The LOCATA Challenge Data Corpus for Acoustic Source Localization and Tracking

Algorithms for acoustic source localization and tracking are essential for a wide range of applications such as personal assistants, smart homes, tele-conferencing systems, hearing aids, or autonomous systems. Numerous algorithms have been proposed for this purpose which, however, are not evaluated and compared against each other by using a common database so far. The IEEE-AASP Challenge on sound source localization and tracking (LOCATA) provides a novel, comprehensive data corpus for the objective benchmarking of state-of-the-art algorithms on sound source localization and tracking. The data corpus comprises six tasks ranging from the localization of a single static sound source with a static microphone array to the tracking of multiple moving speakers with a moving microphone array. It contains real-world multichannel audio recordings, obtained by hearing aids, microphones integrated in a robot head, a planar and a spherical microphone array in an enclosed acoustic environment, as well as positional information about the involved arrays and sound sources represented by moving human talkers or static loudspeakers.

[1]  Gerhard Schmidt,et al.  Localization and Tracking of Acoustical Sources , 2006 .

[2]  Boaz Rafaely,et al.  Speaker localization by humanoid robots in reverberant environments , 2014, 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI).

[3]  Søren Holdt Jensen,et al.  The single- and multichannel audio recordings database (SMARD) , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[4]  M. Omologo,et al.  Comparison Between Different Sound Source Localization Techniques Based on a Real Data Collection , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[5]  LI X.RONG,et al.  Survey of maneuvering target tracking. Part I. Dynamic models , 2003 .

[6]  Emanuel A. P. Habets,et al.  Simulating room impulse responses for spherical microphone arrays , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Boaz Rafaely,et al.  Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Harvey F. Silverman,et al.  SRP-PHAT methods of locating simultaneous multiple talkers using a frame of microphone array data , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  W Noble,et al.  A comparison of different binaural hearing aid systems for sound localization in the horizontal and vertical planes. , 1990, British journal of audiology.

[10]  Boaz Rafaely,et al.  Optimal Design of Microphone Array for Humanoid-Robot Audition , 2016 .

[11]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[12]  Ba-Ngu Vo,et al.  Tracking an unknown time-varying number of speakers using TDOA measurements: a random finite set approach , 2006, IEEE Transactions on Signal Processing.

[13]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[14]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[15]  Walter Kellermann,et al.  Robust localization of multiple sources in reverberant environments using EB-ESPRIT with spherical microphone arrays , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Walter Kellermann,et al.  WOZ acoustic data collection for interactive TV , 2008, Lang. Resour. Evaluation.

[17]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[18]  Deborah Estrin,et al.  Coherent acoustic array processing and localization on wireless sensor networks , 2003, Proc. IEEE.

[19]  Junichi Yamagishi,et al.  SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .

[20]  Boaz Rafaely,et al.  Theoretical Framework for the Optimization of Microphone Array Configuration for Humanoid Robot Audition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[21]  Alastair H. Moore,et al.  The ACE challenge — Corpus description and performance evaluation , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[22]  Patrick A. Naylor,et al.  Acoustic SLAM , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Francesco Nesta,et al.  Cooperative Wiener-ICA for source localization and Separation by distributed microphone arrays , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Alastair H. Moore,et al.  Direction of Arrival Estimation in the Spherical Harmonic Domain Using Subspace Pseudointensity Vectors , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Jacob Benesty,et al.  Broadband Music: Opportunities and Challenges for Multiple Source Localization , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[26]  Sascha Spors,et al.  Joint Audio-Video Signal Processing for Object Localization and Tracking , 2001, Microphone Arrays.

[27]  Alastair H. Moore,et al.  Bearing-only acoustic tracking of moving speakers for robot audition , 2015, 2015 IEEE International Conference on Digital Signal Processing (DSP).

[28]  Raffaele Parisi,et al.  Multi-Source Localization Strategies , 2001, Microphone Arrays.

[29]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[30]  Alessio Brutti,et al.  Speaker Localization in CHIL Lectures: Evaluation Criteria and Results , 2005, MLMI.

[31]  Ying Yu,et al.  Performance of real-time source-location estimators for a large-aperture microphone array , 2005, IEEE Transactions on Speech and Audio Processing.

[32]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[33]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[34]  Branko Ristic,et al.  Beyond the Kalman Filter: Particle Filters for Tracking Applications , 2004 .

[35]  Patrick A. Naylor,et al.  Source tracking using moving microphone arrays for robot audition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Patrick A. Naylor,et al.  Optimized Self-Localization for SLAM in Dynamic Scenes Using Probability Hypothesis Density Filters , 2018, IEEE Transactions on Signal Processing.

[37]  Walter Kellermann,et al.  TDOA Estimation for Multiple Sound Sources in Noisy and Reverberant Environments Using Broadband Independent Component Analysis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.