论文信息 - The LOCATA Challenge: Acoustic Source Localization and Tracking

The LOCATA Challenge: Acoustic Source Localization and Tracking

The ability to localize and track acoustic events is a fundamental prerequisite for equipping machines with the ability to be aware of and engage with humans in their surrounding environment. However, in realistic scenarios, audio signals are adversely affected by reverberation, noise, interference, and periods of speech inactivity. In dynamic scenarios, where the sources and microphone platforms may be moving, the signals are additionally affected by variations in the source-sensor geometries. In practice, approaches to sound source localization and tracking are often impeded by missing estimates of active sources, estimation errors, as well as false estimates. The aim of the LOCAlization and TrAcking (LOCATA) Challenge is an open-access framework for the objective evaluation and benchmarking of broad classes of algorithms for sound source localization and tracking. This article provides a review of relevant localization and tracking algorithms and, within the context of the existing literature, a detailed evaluation and dissemination of the LOCATA submissions. The evaluation highlights achievements in the field, open challenges, and identifies potential future directions.

[1] Ba-Ngu Vo,et al. OSPA(2): Using the OSPA metric to evaluate multi-target tracking performance , 2017, 2017 International Conference on Control, Automation and Information Sciences (ICCAIS).

[2] Susanto Rahardja,et al. AUC Optimization for Deep Learning Based Voice Activity Detection , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Pejman Mowlaee Begzade Mahale,et al. Circular Statistics-based low complexity DOA estimation for hearing aid application , 2018, ArXiv.

[4] M. Ross,et al. Average magnitude difference function pitch extractor , 1974 .

[5] Reinhold Häb-Umbach,et al. Acoustic Microphone Geometry Calibration: An overview and experimental evaluation of state-of-the-art algorithms , 2016, IEEE Signal Processing Magazine.

[6] Reza N. Jazar. Theory of Applied Robotics , 2007 .

[7] Patrick A. Naylor,et al. Locata Challenge-Evaluation Tasks and Measures , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).

[8] T. Kailath,et al. Optimum localization of multiple sources by passive arrays , 1983 .

[9] Yaakov Bar-Shalom,et al. Sonar tracking of multiple targets using joint probabilistic data association , 1983 .

[10] Christophe Beaugeant,et al. Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals , 2011, 2011 19th European Signal Processing Conference.

[11] Petar M. Djuric,et al. New resampling algorithms for particle filters , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12] Boaz Rafaely,et al. Spherical Microphone Array Beamforming , 2010 .

[13] I. Cohen,et al. Generating nonstationary multisensor signals under a spatial coherence constraint. , 2008, The Journal of the Acoustical Society of America.

[14] Ronald P. S. Mahler,et al. Statistical Multisource-Multitarget Information Fusion , 2007 .

[15] Jan Wouters,et al. Sound source localization using hearing aids with microphones placed behind-the-ear, in-the-canal, and in-the-pinna , 2011, International journal of audiology.

[16] Kanti V. Mardia,et al. Bayesian analysis for bivariate von Mises distributions , 2010 .

[17] Buket D. Barkana,et al. Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy , 2008, SCSS.

[18] Daniel E. Clark,et al. Incorporating track uncertainty into the OSPA metric , 2011, 14th International Conference on Information Fusion.

[19] Radu Horaud,et al. 2D sound-source localization on the binaural manifold , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[20] Haizhou Li,et al. A learning-based approach to direction of arrival estimation in noisy and reverberant environments , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21] Archontis Politis,et al. Direction of Arrival Estimation of Reflections from Room Impulse Responses Using a Spherical Microphone Array , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22] Stefano Soatto,et al. Structure from Motion Causally Integrated Over Time , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[23] Thad Hughes,et al. Recurrent neural networks for voice activity detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24] John Mason,et al. Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[25] R.P.S. Mahler,et al. "Statistics 101" for multisensor, multitarget data fusion , 2004, IEEE Aerospace and Electronic Systems Magazine.

[26] Rajesh M. Hegde,et al. Near-Field Acoustic Source Localization and Beamforming in Spherical Harmonics Domain , 2016, IEEE Transactions on Signal Processing.

[27] H. Wallach,et al. The role of head movements and vestibular and visual cues in sound localization. , 1940 .

[28] Boaz Rafaely,et al. Fundamentals of Spherical Array Processing , 2015, Springer Topics in Signal Processing.

[29] Ba-Ngu Vo,et al. A Consistent Metric for Performance Evaluation of Multi-Object Filters , 2008, IEEE Transactions on Signal Processing.

[30] Emanuel A. P. Habets,et al. Spotforming: Spatial Filtering With Distributed Arrays for Position-Selective Sound Acquisition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[32] Steven A. Tretter,et al. Optimum processing for delay-vector estimation in passive signal arrays , 1973, IEEE Trans. Inf. Theory.

[33] Boaz Rafaely,et al. Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[34] Harvey F. Silverman,et al. A Fast Microphone Array SRP-PHAT Source Location Implementation using Coarse-To-Fine Region Contraction(CFRC) , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[35] Jon Barker,et al. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.

[36] Jacob Benesty,et al. A Generalized Steered Response Power Method for Computationally Viable Source Localization , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[37] Thia Kirubarajan,et al. Performance measures for multiple target tracking problems , 2011, 14th International Conference on Information Fusion.

[38] Richard C. Hendriks,et al. Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[39] Radu Horaud,et al. Tracking Multiple Audio Sources With the von Mises Distribution and Variational EM , 2018, IEEE Signal Processing Letters.

[40] Patrick A. Naylor,et al. DoA Reliability for Distributed Acoustic Tracking , 2018, IEEE Signal Processing Letters.

[41] Tamir Hazan,et al. Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[42] R. O. Schmidt,et al. Multiple emitter location and signal Parameter estimation , 1986 .

[43] Martin Vetterli,et al. Acoustic echoes reveal room shape , 2013, Proceedings of the National Academy of Sciences.

[44] Scott Rickard,et al. Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[45] Athanasios Mouchtaris,et al. A Survey of Sound Source Localization Methods in Wireless Acoustic Sensor Networks , 2017, Wirel. Commun. Mob. Comput..

[46] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[47] Gaetano Scarano,et al. Discrete time techniques for time delay estimation , 1993, IEEE Trans. Signal Process..

[48] Mingsian R. Bai,et al. Time Difference of Arrival (TDOA)-Based Acoustic Source Localization and Signal Extraction for Intelligent Audio Classification , 2018, 2018 IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM).

[49] Philippe Souères,et al. A survey on sound source localization in robotics: From binaural to array processing methods , 2015, Comput. Speech Lang..

[50] Volker Hohmann,et al. Sound source localization in real sound fields based on empirical statistics of interaural parameters. , 2006, The Journal of the Acoustical Society of America.

[51] Ivan Markovic,et al. Von Mises Mixture PHD Filter , 2015, IEEE Signal Processing Letters.

[52] Branko Ristic,et al. Particle Filters for Random Set Models , 2013 .

[53] Walter Kellermann,et al. EB-ESPRIT: 2D localization of multiple wideband acoustic sources using eigen-beams , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[54] Guy J. Brown,et al. Robust Binaural Localization of a Target Sound Source by Combining Spectral Source Models and Deep Neural Networks , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[55] François Charpillet,et al. Motion planning for robot audition , 2019, Auton. Robots.

[56] Stefan B. Williams,et al. Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57] Emanuel A. P. Habets,et al. Multi-Speaker Localization Using Convolutional Neural Network Trained with Noise , 2017, ArXiv.

[58] B. AfeArd. CALCULATING THE SINGULAR VALUES AND PSEUDOINVERSE OF A MATRIX , 2022 .

[59] Mike E. Davies,et al. Latent Variable Analysis and Signal Separation , 2010 .

[60] Rafik A. Goubran,et al. Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[61] Radu Horaud,et al. Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[62] E. C. Cmm,et al. on the Recognition of Speech, with , 2008 .

[63] S. S. Blackman,et al. Association and Fusion of Multiple Sensor Data , 1990 .

[64] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[65] Brendan J. Frey,et al. Robust variational speech separation using fewer microphones than speakers , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[66] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .

[67] Tomohiro Nakatani,et al. Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition , 2012, IEEE Signal Process. Mag..

[68] Radu Horaud,et al. A cascaded multiple-speaker localization and tracking system , 2018, ArXiv.

[69] R. Tucker,et al. Voice activity detection using a periodicity measure , 1992 .

[70] Boaz Rafaely,et al. Speaker localization using the direct-path dominance test for arbitrary arrays , 2018, 2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE).

[71] Boaz Rafaely,et al. Optimal Design of Microphone Array for Humanoid-Robot Audition , 2016 .

[72] Hiroshi G. Okuno,et al. Improved Sound Source Localization and Front-Back Disambiguation for Humanoid Robots with Two Ears , 2013, IEA/AIE.

[73] G. F. Kuhn. Model for the interaural time differences in the azimuthal plane , 1977 .

[74] B.D. Van Veen,et al. Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[75] Walter Kellermann,et al. Hybrid Particle Filtering Based on an Elitist Resampling Scheme , 2018, 2018 IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM).

[76] Alastair H. Moore,et al. Direction of Arrival Estimation in the Spherical Harmonic Domain Using Subspace Pseudointensity Vectors , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[77] Jon Barker,et al. The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[78] Francesco Nesta,et al. Cooperative Wiener-ICA for source localization and Separation by distributed microphone arrays , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[79] Iván V. Meza,et al. Localization of sound sources in robotics: A review , 2017, Robotics Auton. Syst..

[80] Thomas Kailath,et al. ESPRIT-estimation of signal parameters via rotational invariance techniques , 1989, IEEE Trans. Acoust. Speech Signal Process..

[81] Y. Bar-Shalom,et al. The probabilistic data association filter , 2009, IEEE Control Systems.

[82] Ba-Ngu Vo,et al. Performance evaluation of multi-target tracking using the OSPA metric , 2010, 2010 13th International Conference on Information Fusion.

[83] Gene H. Golub,et al. Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[84] G. Carter. Coherence and time delay estimation , 1987, Proceedings of the IEEE.

[85] Josh H. McDermott. The cocktail party problem , 2009, Current Biology.

[86] Søren Holdt Jensen,et al. The single- and multichannel audio recordings database (SMARD) , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[87] Norbert Dillier,et al. A fast and accurate “shoebox” room acoustics simulator , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[88] W R Thurlow,et al. Head movements during sound localization. , 1967, The Journal of the Acoustical Society of America.

[89] Walter Kellermann,et al. Minimum Mutual Information-Based Linearly Constrained Broadband Signal Extraction , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[90] Walter Kellermann,et al. BSS for improved interference estimation for Blind speech signal Extraction with two microphones , 2009, 2009 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[91] Simon J. Godsill,et al. Acoustic Source Localization and Tracking of a Time-Varying Number of Speakers , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[92] Wen-Jun Zeng,et al. High-Resolution Multiple Wideband and Nonstationary Source Localization With Unknown Number of Sources , 2010, IEEE Transactions on Signal Processing.

[93] Simon J. Godsill,et al. On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[94] Walter Kellermann,et al. Multidimensional Localization of Multiple Sound Sources Using Blind Adaptive MIMO System Identification , 2006, 2006 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[95] Jacob Benesty,et al. Real-time passive source localization: a practical linear-correction least-squares approach , 2001, IEEE Trans. Speech Audio Process..

[96] Ba-Ngu Vo,et al. Tracking an unknown time-varying number of speakers using TDOA measurements: a random finite set approach , 2006, IEEE Transactions on Signal Processing.

[97] Ronald P. S. Mahler,et al. “Statistics 102” for Multisource-Multitarget Detection and Tracking , 2013, IEEE Journal of Selected Topics in Signal Processing.

[98] Neil J. Gordon,et al. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[99] Ronald P. S. Mahler,et al. Advances in Statistical Multisource-Multitarget Information Fusion , 2014 .

[100] James R. Glass,et al. Sound Event Localization and Detection Using CRNN on Pairs of Microphones , 2019, DCASE.

[101] Boaz Rafaely,et al. Analysis and design of spherical microphone arrays , 2005, IEEE Transactions on Speech and Audio Processing.

[102] Guy J. Brown,et al. Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[103] Maximo Cobos,et al. A Modified SRP-PHAT Functional for Robust Real-Time Sound Source Localization With Scalable Spatial Sampling , 2011, IEEE Signal Processing Letters.

[104] Jacob Benesty,et al. Performance of GCC- and AMDF-Based Time-Delay Estimation in Practical Reverberant Environments , 2005, EURASIP J. Adv. Signal Process..

[105] Darren B. Ward,et al. Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[106] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .

[107] Andrea Cavallaro,et al. LOCATA challenge: speaker localization with a planar array , 2019, ArXiv.

[108] F L Wightman,et al. Resolution of front-back ambiguity in spatial hearing by listener and source movement. , 1999, The Journal of the Acoustical Society of America.

[109] P M Zurek,et al. Probability distributions of interaural phase and level differences in binaural detection stimuli. , 1991, The Journal of the Acoustical Society of America.

[110] Daniel P. W. Ellis,et al. Combining localization cues and source model constraints for binaural source separation , 2011, Speech Commun..

[111] B. Rafaely. Plane-wave decomposition of the sound field on a sphere by spherical convolution , 2004 .

[112] W Noble,et al. The contribution of head motion cues to localization of low-pass noise , 1997, Perception & psychophysics.

[113] Jacob Benesty,et al. Broadband Source Localization From an Eigenanalysis Perspective , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[114] Walter Kellermann,et al. WOZ acoustic data collection for interactive TV , 2008, Lang. Resour. Evaluation.

[115] Jacob Benesty,et al. Time-delay estimation via linear interpolation and cross correlation , 2004, IEEE Transactions on Speech and Audio Processing.

[116] Yuxin Zhao,et al. Non-Zero Diffusion Particle Flow SMC-PHD Filter for Audio-Visual Multi-Speaker Tracking , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[117] H. Kuhn. The Hungarian method for the assignment problem , 1955 .

[118] Paris Smaragdis,et al. Position and Trajectory Learning for Microphone Arrays , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[119] Jesper Jensen,et al. Informed Sound Source Localization Using Relative Transfer Functions for Hearing Aid Applications , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[120] R. Hansen. Introduction to Synthetic Aperture Sonar , 2011 .

[121] Daniel P. W. Ellis,et al. Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[122] Hui Cao,et al. Maximum Likelihood TDOA Estimation From Compressed Sensing Samples Without Reconstruction , 2017, IEEE Signal Processing Letters.

[123] Yang Liu,et al. Intensity Particle Flow SMC-PHD Filter For Audio Speaker Tracking , 2018, ArXiv.

[124] Guy J. Brown,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[125] Sascha Spors,et al. An audio-visual database for evaluating person tracking algorithms , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[126] Peter Gerstoft,et al. Eigenvalues of the sample covariance matrix for a towed array. , 2012, The Journal of the Acoustical Society of America.

[127] D. Leakey. Some Measurements on the Effects of Interchannel Intensity and Time Differences in Two Channel Sound Systems , 1959 .

[128] Alastair H. Moore,et al. Estimation of Room Acoustic Parameters: The ACE Challenge , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[129] Emanuel A. P. Habets,et al. Multiple-Hypothesis Extended Particle Filter for Acoustic Source Localization in Reverberant Environments , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[130] Guy J. Brown,et al. Mask estimation for missing data speech recognition based on statistics of binaural interaction , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[131] Sharon Gannot,et al. Distributed Expectation-Maximization Algorithm for Speaker Localization in Reverberant Environments , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[132] O. Kirkeby,et al. Resolution of front-back confusion in virtual acoustic imaging systems. , 2000, The Journal of the Acoustical Society of America.

[133] Jean-Marc Odobez,et al. AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking , 2004, MLMI.

[134] James R. Hopgood,et al. Particle filtering for TDOA based acoustic source tracking: Nonconcurrent Multiple Talkers , 2014, Signal Process..

[135] Francesco Nesta,et al. Generalized State Coherence Transform for Multidimensional TDOA Estimation of Multiple Sources , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[136] Miriam A. Doron,et al. On direction finding of an emitting source from time delays , 1999 .

[137] Roy Edgar Hansen,et al. Synthetic aperture sonar technology review , 2013 .

[138] E. Langendijk,et al. Contribution of spectral cues to human sound localization. , 1999, The Journal of the Acoustical Society of America.

[139] Yuuki Tachioka. Dnn-Based Voice Activity Detection Using Auxiliary Speech Models in Noisy Environments , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[140] Jont B. Allen,et al. Image method for efficiently simulating small‐room acoustics , 1976 .

[141] Rainer Stiefelhagen,et al. The CLEAR 2006 Evaluation , 2006, CLEAR.

[142] M.P. Hayes,et al. Synthetic Aperture Sonar: A Review of Current Status , 2009, IEEE Journal of Oceanic Engineering.

[143] Benesty,et al. Adaptive eigenvalue decomposition algorithm for passive acoustic source localization , 2000, The Journal of the Acoustical Society of America.

[144] Walter Kellermann,et al. Detection and localization of multiple wideband acoustic sources based on wavefield decomposition using spherical apertures , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[145] Boaz Rafaely,et al. Theoretical Framework for the Optimization of Microphone Array Configuration for Humanoid Robot Audition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[146] John W. McDonough,et al. Tracking Multiple Speakers with Probabilistic Data Association Filters , 2006, CLEAR.

[147] Thippur V. Sreenivas,et al. TDOA-Based Multiple Acoustic Source Localization Without Association Ambiguity , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[148] Ying Yu,et al. Performance of real-time source-location estimators for a large-aperture microphone array , 2005, IEEE Transactions on Speech and Audio Processing.

[149] G. Carter,et al. The generalized correlation method for estimation of time delay , 1976 .

[150] Antoine Deleforge,et al. Evaluation of an open-source implementation of the SRP-PHAT algorithm within the 2018 LOCATA challenge , 2018, ArXiv.

[151] Wonyong Sung,et al. A voice activity detector employing soft decision based noise spectrum adaptation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[152] Jacob Benesty,et al. Broadband Music: Opportunities and Challenges for Multiple Source Localization , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[153] Ji Wu,et al. Efficient Multiple Kernel Support Vector Machine Based Voice Activity Detection , 2011, IEEE Signal Processing Letters.

[154] Aaron E. Rosenberg,et al. A comparative performance study of several pitch detection algorithms , 1976 .

[155] Alastair H. Moore. Multiple source direction of arrival estimation using subspace pseudointensity vectors , 2018, ArXiv.

[156] Y. Bar-Shalom,et al. Tracking in a cluttered environment with probabilistic data association , 1975, Autom..

[157] Michael S. Brandstein,et al. Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[158] Kazunori Komatani,et al. Unsupervised adaptation of deep neural networks for sound source localization using entropy minimization , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[159] Masayuki Inaba,et al. Motion Planning for Humanoid Robots , 2003, ISRR.

[160] Walter Kellermann,et al. TRINICON: a versatile framework for multichannel blind signal processing , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[161] Alastair H. Moore,et al. The ACE challenge — Corpus description and performance evaluation , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[162] Patrick A. Naylor,et al. Acoustic SLAM , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[163] Emanuel A. P. Habets,et al. Theory and Applications of Spherical Microphone Array Processing , 2016 .

[164] Nicolas Obin,et al. Binaural Localization of Multiple Sound Sources by Non-Negative Tensor Factorization , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[165] Ehud Weinstein,et al. Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[166] Ángel F. García-Fernández,et al. Generalized optimal sub-pattern assignment metric , 2016, 2017 20th International Conference on Information Fusion (Fusion).

[167] Athanasios Mouchtaris,et al. 3D DOA estimation of multiple sound sources based on spatially constrained beamforming driven by intensity vectors , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[168] Nando de Freitas,et al. Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[169] Boaz Rafaely,et al. Open-Sphere Designs for Spherical Microphone Arrays , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[170] T. MacRobert. Spherical harmonics : an elementary treatise on harmonic functions , 1927 .

[171] R. Maas,et al. A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research , 2016, EURASIP Journal on Advances in Signal Processing.

[172] Radu Horaud,et al. Learning the Direction of a Sound Source Using Head Motions and Spectral Features , 2011 .

[173] Harry L. Van Trees,et al. Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory , 2002 .

[174] Eric A. Lehmann,et al. Particle Filter Design Using Importance Sampling for Acoustic Source Localisation and Tracking in Reverberant Environments , 2006, EURASIP J. Adv. Signal Process..

[175] Walter Kellermann,et al. Simultaneous localization of multiple sound sources using blind adaptive MIMO filtering , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[176] Sharon Gannot,et al. Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[177] Paris Smaragdis,et al. A Wrapped Kalman Filter for Azimuthal Speaker Tracking , 2013, IEEE Signal Processing Letters.

[178] Emanuel A. P. Habets,et al. Simulating room impulse responses for spherical microphone arrays , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[179] D.J. Salmond,et al. Mixture Reduction Algorithms for Point and Extended Object Tracking in Clutter , 2009, IEEE Transactions on Aerospace and Electronic Systems.

[180] Hiroshi Sawada,et al. Direction of arrival estimation for multiple source signals using independent component analysis , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[181] Carlo Drioli,et al. Localization and Tracking of an Acoustic Source using a Diagonal Unloading Beamforming and a Kalman Filter , 2018, ArXiv.

[182] Branko Ristic,et al. A Metric for Performance Evaluation of Multi-Target Tracking Algorithms , 2011, IEEE Transactions on Signal Processing.

[183] Jingdong Chen,et al. Microphone Array Signal Processing , 2008 .

[184] Radu Horaud,et al. Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments , 2018, IEEE Journal of Selected Topics in Signal Processing.

[185] Christopher V. Alvino,et al. Geometric source separation: merging convolutive source separation with geometric beamforming , 2001, Neural Networks for Signal Processing XI: Proceedings of the 2001 IEEE Signal Processing Society Workshop (IEEE Cat. No.01TH8584).

[186] Kazunori Komatani,et al. Sound source localization based on deep neural networks with directional activate function exploiting phase information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[187] Radu Horaud,et al. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[188] Anna Freud,et al. Design And Analysis Of Modern Tracking Systems , 2016 .

[189] Antoine Deleforge,et al. VAST: The Virtual Acoustic Space Traveler Dataset , 2016, LVA/ICA.

[190] Radu Horaud,et al. RAVEL: an annotated corpus for training robots with audiovisual abilities , 2013, Journal on Multimodal User Interfaces.

[191] Antoine Deleforge,et al. DREGON: Dataset and Methods for UAV-Embedded Sound Source Localization , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[192] D. Wang,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[193] Patrick A. Naylor,et al. Optimized Self-Localization for SLAM in Dynamic Scenes Using Probability Hypothesis Density Filters , 2018, IEEE Transactions on Signal Processing.

[194] Rainer Martin,et al. Binaural Source Localization based on Modulation-Domain Features and Decision Pooling , 2018, ArXiv.

[195] Walter Kellermann,et al. TDOA Estimation for Multiple Sound Sources in Noisy and Reverberant Environments Using Broadband Independent Component Analysis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[196] Sharon Gannot,et al. Blind Synchronization in Wireless Acoustic Sensor Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[197] Roger M. Goodall,et al. Estimation of parameters in a linear state space model using a Rao-Blackwellised particle filter , 2004 .

[198] Srdan Kitic,et al. TRAMP: Tracking by a Real-time AMbisonic-based Particle filter , 2018, ArXiv.

[199] Walter Kellermann,et al. Learning-Based Acoustic Source-Microphone Distance Estimation Using the Coherent-to-Diffuse Power Ratio , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[200] Michael Syskind Pedersen,et al. Bias-Compensated Informed Sound Source Localization Using Relative Transfer Functions , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[201] Maurizio Omologo,et al. The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[202] Patrick A. Naylor,et al. The LOCATA Challenge Data Corpus for Acoustic Source Localization and Tracking , 2018, 2018 IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM).

[203] Emanuel A. P. Habets,et al. Broadband doa estimation using convolutional neural networks trained with noise signals , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[204] W. R. Howard,et al. Mathematical Systems Theory I: Modelling, State Space Analysis, Stability and Robustness , 2005 .

[205] James R. Hopgood,et al. Nonconcurrent multiple speakers tracking based on extended Kalman particle filter , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[206] Oliver E. Drummond,et al. Performance metrics for multiple-sensor multiple-target tracking , 2000, SPIE Defense + Commercial Sensing.

[207] Petar M. Djuric,et al. Resampling Methods for Particle Filtering: Classification, implementation, and strategies , 2015, IEEE Signal Processing Magazine.

[208] Walter Kellermann,et al. Evolutionary Resampling for Multi-Target Tracking using Probability Hypothesis Density Filter , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[209] Branko Ristic,et al. Beyond the Kalman Filter: Particle Filters for Tracking Applications , 2004 .

[210] Michael S. Brandstein,et al. A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[211] Kevin Wilson,et al. Speech Source Separation by Combining Localization Cues with Mixture Models of Speech Spectra , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[212] J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[213] Patrick A. Naylor,et al. Source tracking using moving microphone arrays for robot audition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[214] Harald Viste,et al. Binaural Source Localization by Joint Estimation of ILD and ITD , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[215] F. Wightman,et al. The dominant role of low-frequency interaural time differences in sound localization. , 1992, The Journal of the Acoustical Society of America.

[216] Carlo Drioli,et al. A Low-Complexity Robust Beamforming Using Diagonal Unloading for Acoustic Source Localization , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[217] Ivan Dokmanic,et al. Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[218] Daniel C. Marcus,et al. 46 – Acoustic Transduction , 2001 .

[219] Boaz Rafaely,et al. Description of algorithms for Ben-Gurion University Submission to the LOCATA challenge , 2018, ArXiv.

[220] Sharon Gannot,et al. Semi-Supervised Sound Source Localization Based on Manifold Regularization , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[221] Sharon Gannot,et al. Time difference of arrival estimation of speech source in a noisy and reverberant environment , 2005, Signal Process..

[222] Sharon Gannot,et al. Combined LCMV-TRINICON Beamforming for Separating Multiple Speech Sources in Noisy and Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.