Multi-target tracking through opportunistic camera control in a resource constrained multimodal sensor network

While wide-area video surveillance is an important application, it is often not practical, from a technical and social perspective, to have video cameras that completely cover the entire region of interest. For obtaining good surveillance results in a sparse camera networks requires that they be complemented by additional sensors with different modalities, their intelligent assignment in a dynamic environment, and scene understanding using these multimodal inputs. In this paper, we propose a probabilistic scheme for opportunistically deploying cameras to the most interesting parts of a scene dynamically given data from a set of video and audio sensors. The audio data is continuously processed to identify interesting events, e.g., entry/exit of people, merging or splitting of groups, and so on. This is used to indicate the time instants to turn on the cameras. Thereafter, analysis of the video determines how long the cameras stay on and whether their pan/tilt/zoom parameters change. Events are tracked continuously by combining the audio and video data. Correspondences between the audio and video sensor observations are obtained through a learned homography between the image plane and ground plane. The method leads to efficient usage of the camera resources by focusing on the most important parts of the scene, saves power, bandwidth and cost, and reduces concerns of privacy. We show detailed experimental results on real data collected in multimodal networks.

[1]  Amit K. Roy-Chowdhury,et al.  Stochastic Adaptive Tracking In A Camera Network , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Ramani Duraiswami,et al.  Microphone Arrays as Generalized Cameras for Integrated Audio Visual Processing , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Luc Van Gool,et al.  Object Tracking with an Adaptive Color-Based Particle Filter , 2002, DAGM-Symposium.

[4]  J. G. Semple,et al.  Algebraic Projective Geometry , 1953 .

[5]  Feng Guo,et al.  Sample-efficiency-optimized auxiliary particle filter , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[6]  Michael Elad,et al.  Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Paulo Sergio Ramirez,et al.  Fundamentals of Adaptive Filtering , 2002 .

[8]  Ramakant Nevatia,et al.  Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  Mubarak Shah,et al.  Tracking across multiple cameras with disjoint views , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Konstantinos A. Tarabanis,et al.  A survey of sensor planning in computer vision , 1995, IEEE Trans. Robotics Autom..

[11]  Richard M. Karp,et al.  A n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs , 1971, SWAT.

[12]  Nicholas J. Butko,et al.  Active perception , 2010 .

[13]  M. Pitt,et al.  Filtering via Simulation: Auxiliary Particle Filters , 1999 .

[14]  Jake K. Aggarwal,et al.  Automatic tracking of human motion in indoor scenes across multiple synchronized video streams , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[15]  Andrew Zisserman,et al.  Metric rectification for perspective images of planes , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[16]  Mubarak Shah,et al.  Human Tracking in Multiple Cameras , 2001, ICCV.

[17]  W. Eric L. Grimson,et al.  Inference of non-overlapping camera network topology by measuring statistical dependence , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Mubarak Shah,et al.  Camera handoff: tracking in multiple uncalibrated stationary cameras , 2000, Proceedings Workshop on Human Motion.

[19]  Bir Bhanu,et al.  Tracking Humans using Multi-modal Fusion , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[20]  Dana H. Ballard,et al.  Animate Vision , 1991, Artif. Intell..

[21]  Yiannis Aloimonos,et al.  Active vision , 2004, International Journal of Computer Vision.

[22]  Shahriar Negahdaripour Epipolar Geometry of Opti-Acoustic Stereo Imaging , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Stan Sclaroff,et al.  Automated camera layout to satisfy task-specific and floor plan-specific coverage requirements , 2006, Comput. Vis. Image Underst..

[24]  Bernt Schiele,et al.  Comprehensive Colour Image Normalization , 1998, ECCV.

[25]  Dimitrios Makris,et al.  Bridging the gaps between cameras , 2004, CVPR 2004.

[26]  Nebojsa Jojic,et al.  A Graphical Model for Audiovisual Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Mubarak Shah,et al.  Tracking in Multiple Cameras with Disjoint Views , 2008 .

[28]  Demetri Terzopoulos,et al.  Surveillance in Virtual Reality: System Design and Multi-Camera Control , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Derek Hoiem,et al.  Computer vision for music identification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).