Seeing the Sound: A New Multimodal Imaging Device for Computer Vision

Audio imaging can play a fundamental role in computer vision, in particular in automated surveillance, boosting the accuracy of current systems based on standard optical cameras. We present here a new hybrid device for acoustic-optic imaging, whose characteristics are tailored to automated surveillance. In particular, the device allows realtime, high frame rate generation of an acoustic map, overlaid over a standard optical image using a geometric calibration of audio and video streams. We demonstrate the potentialities of the device for target tracking on three challenging setup showing the advantages of using acoustic images against baseline algorithms on image tracking. In particular, the proposed approach is able to overcome, often dramatically, visual tracking with state-of-art algorithms, dealing efficiently with occlusions, abrupt variations in visual appearence and camouflage. These results pave the way to a widespread use of acoustic imaging in application scenarios such as in surveillance and security.

[1]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  S. Y. Chen,et al.  Kalman Filter for Robot Vision: A Survey , 2012, IEEE Transactions on Industrial Electronics.

[3]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[4]  Eric A. Lehmann,et al.  Particle Filter with Integrated Voice Activity Detection for Acoustic Source Tracking , 2007, EURASIP J. Adv. Signal Process..

[5]  Andy W. H. Khong,et al.  Speaker localization and tracking in the presence of sound interference by exploiting speech harmonicity , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Marco Crocco,et al.  Design of Superdirective Planar Arrays With Sparse Aperiodic Layouts for Processing Broadband Signals via 3-D Beamforming , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Gérard G. Medioni,et al.  Online Tracking and Reacquisition Using Co-trained Generative and Discriminative Trackers , 2008, ECCV.

[8]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[9]  Alberto Del Bimbo,et al.  Object Tracking by Oversampling Local Features , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Darren B. Ward,et al.  Particle filtering algorithms for tracking an acoustic source in a reverberant environment , 2003, IEEE Trans. Speech Audio Process..

[11]  Andrew Blake,et al.  Sparse Bayesian learning for efficient visual tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Harry L. Van Trees,et al.  Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory , 2002 .

[13]  H. V. Trees Detection, Estimation, And Modulation Theory , 2001 .

[14]  Volkan Cevher,et al.  Acoustic Multitarget Tracking Using Direction-of-Arrival Batches , 2007, IEEE Transactions on Signal Processing.

[15]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Gernot A. Fink,et al.  Multi-speaker tracking using multiple distributed microphone arrays , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Simon J. Godsill,et al.  Acoustic Source Localization and Tracking Using Track Before Detect , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  N. Gordon,et al.  Novel approach to nonlinear/non-Gaussian Bayesian state estimation , 1993 .

[19]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.