Dynamical information fusion of heterogeneous sensors for 3D tracking using particle swarm optimization

This paper presents a new method for three dimensional object tracking by fusing information from stereo vision and stereo audio. From the audio data, directional information about an object is extracted by the Generalized Cross Correlation (GCC) and the object's position in the video data is detected using the Continuously Adaptive Mean shift (CAMshift) method. The obtained localization estimates combined with confidence measurements are then fused to track an object utilizing Particle Swarm Optimization (PSO). In our approach the particles move in the 3D space and iteratively evaluate their current position with regard to the localization estimates of the audio and video module and their confidences, which facilitates the direct determination of the object's three dimensional position. This technique has low computational complexity and its tracking performance is independent of any kind of model, statistics, or assumptions, contrary to classical methods. The introduction of confidence measurements further increases the robustness and reliability of the entire tracking system and allows an adaptive and dynamical information fusion of heterogenous sensor information.

[1]  Nebojsa Jojic,et al.  A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Swarup Medasani,et al.  Cognitive swarms for rapid detection of objects and associations in visual imagery , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[3]  K. Wilson,et al.  Person Tracking Using Audio-Video Sensor Fusion , 2001 .

[4]  Larry S. Davis,et al.  Joint Audio-Visual Tracking Using Particle Filters , 2002, EURASIP J. Adv. Signal Process..

[5]  Andrew D. Christian,et al.  Digital smart kiosk project , 1998, CHI.

[6]  Klaus Diepold,et al.  Three dimensional object tracking based on audiovisual fusion using Particle Swarm Optimization , 2008, 2008 11th International Conference on Information Fusion.

[7]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[8]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[9]  Maurice Milgram,et al.  Face Detection and Skin Color Based Tracking: A Comparative Study , 2007, IPCV.

[10]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[11]  Rolf Isermann,et al.  Identifikation dynamischer Systeme , 1988 .

[12]  Bir Bhanu,et al.  Tracking Humans using Multi-modal Fusion , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[13]  Martin R. Gibbs,et al.  Mediating intimacy: designing technologies to support strong-tie relationships , 2005, CHI.

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Norman Poh,et al.  Hybrid Biometric Person Authentication Using Face and Voice Features , 2001, AVBPA.

[16]  Yan Meng,et al.  Adaptive Object Tracking using Particle Swarm Optimization , 2007, 2007 International Symposium on Computational Intelligence in Robotics and Automation.

[17]  Sebastian Lang,et al.  Audiovisual Person Tracking with a Mobile Robot , 2004 .

[18]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[19]  Ben Kröse,et al.  Probabilistic audio visual sensor fusion for speaker detection , 2006 .

[20]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[21]  Payam Saisan,et al.  Multi-View Classifier Swarms for Pedestrian Detection and Tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[22]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[23]  Riad I. Hammoud,et al.  Pedestrian tracking by fusion of thermal-visible surveillance videos , 2010, Machine Vision and Applications.

[24]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[25]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[26]  Aristodemos Pnevmatikakis,et al.  Real Time Audio-Visual Person Tracking , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[27]  Gary Bradski,et al.  Computer Vision Face Tracking For Use in a Perceptual User Interface , 1998 .

[28]  Naoyuki Ichimura,et al.  Detection and Separation of Speech Event Using Audio and Video Information Fusion and Its Application to Robust Speech Interface , 2004, EURASIP J. Adv. Signal Process..

[29]  Klaus Diepold,et al.  MutanT: A modular and generic tool for multi-sensor data processing , 2009, 2009 12th International Conference on Information Fusion.