Audio informed visual speaker tracking with SMC-PHD filter

Sequential Monte Carlo probability hypothesis density (SMC-PHD) filter has received much interest in the field of nonlinear non-Gaussian visual tracking due to its ability to handle a variable number of speakers. The SMC-PHD filter employs surviving, spawned and born particles to model the state of the speakers and jointly estimates the variable number of speakers with their states. The born particles play a critical role in the detection of new speakers, which makes it necessary to propagate them in each frame. However, this increases the computational cost of the visual tracker. Here, we propose to use audio data to determine when to propagate the born particles and re-allocate the surviving and spawned particles. In our framework, we employ audio data as an aid to visual SMC-PHD (V-SMC-PHD) filter by using the direction of arrival (DOA) angles of the audio sources to reshape the distribution of the particles. Experimental results on the AV16:3 dataset with multi-speaker sequences show that our proposed audio-visual SMC-PHD (AV-SMC-PHD) filter improves the tracking performance in terms of estimation accuracy and computational efficiency.

[1]  Ba-Ngu Vo,et al.  Tracking an unknown time-varying number of speakers using TDOA measurements: a random finite set approach , 2006, IEEE Transactions on Signal Processing.

[2]  R. Mahler Multitarget Bayes filtering via first-order multitarget moments , 2003 .

[3]  J. Odobez,et al.  AV 16 . 3 : An Audio-Visual Corpus for Speaker Localization and Tracking , .

[4]  Emilio Maggio,et al.  Efficient Multitarget Visual Tracking Using Random Finite Sets , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Ba-Ngu Vo,et al.  A Random-Finite-Set Approach to Bayesian SLAM , 2011, IEEE Transactions on Robotics.

[6]  A. Doucet,et al.  Sequential Monte Carlo methods for multitarget filtering with random finite sets , 2005, IEEE Transactions on Aerospace and Electronic Systems.

[7]  Branko Ristic,et al.  A Metric for Performance Evaluation of Multi-Target Tracking Algorithms , 2011, IEEE Transactions on Signal Processing.

[8]  Josef Kittler,et al.  Audio constrained particle filter based visual tracking , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Josef Kittler,et al.  Audio Assisted Robust Visual Tracking With Adaptive Particle Filtering , 2015, IEEE Transactions on Multimedia.

[10]  Youfu Li,et al.  Entropy distribution and coverage rate-based birth intensity estimation in GM-PHD filter for multi-target visual tracking , 2014, Signal Process..

[11]  Jean-Marc Odobez,et al.  AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking , 2004, MLMI.

[12]  Branko Ristic,et al.  A color-based particle filter for joint detection and tracking of multiple objects , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Josef Kittler,et al.  Adaptive particle filtering approach to audio-visual tracking , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[14]  Yang Wang,et al.  Adaptive multifeature visual tracking in a probability-hypothesis-density filtering framework , 2013, Signal Process..

[15]  Ba-Ngu Vo,et al.  On performance evaluation of multi-object filters , 2008, 2008 11th International Conference on Information Fusion.

[16]  Ba-Ngu Vo,et al.  A Consistent Metric for Performance Evaluation of Multi-Object Filters , 2008, IEEE Transactions on Signal Processing.