Object Detection, Tracking and Recognition for Multiple Smart Cameras

Video cameras are among the most commonly used sensors in a large number of applications, ranging from surveillance to smart rooms for videoconferencing. There is a need to develop algorithms for tasks such as detection, tracking, and recognition of objects, specifically using distributed networks of cameras. The projective nature of imaging sensors provides ample challenges for data association across cameras. We first discuss the nature of these challenges in the context of visual sensor networks. Then, we show how real-world constraints can be favorably exploited in order to tackle these challenges. Examples of real-world constraints are (a) the presence of a world plane, (b) the presence of a three-dimiensional scene model, (c) consistency of motion across cameras, and (d) color and texture properties. In this regard, the main focus of this paper is towards highlighting the efficient use of the geometric constraints induced by the imaging devices to derive distributed algorithms for target detection, tracking, and recognition. Our discussions are supported by several examples drawn from real applications. Lastly, we also describe several potential research problems that remain to be addressed.

[1]  Marco La Cascia,et al.  Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  M. Johansson,et al.  Faster Linear Iterations for Distributed Averaging , 2008 .

[4]  Rama Chellappa,et al.  Model Driven Segmentation of Articulating Humans in Laplacian Eigenspace , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Stephen P. Boyd,et al.  Distributed average consensus with least-mean-square deviation , 2007, J. Parallel Distributed Comput..

[6]  Parameswaran Ramanathan,et al.  Distributed particle filter with GMM approximation for multiple targets localization and tracking in wireless sensor network , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[7]  Wolfgang Straßer,et al.  3D Surveillance A Distributed Network of Smart Cameras for Real-Time Tracking and its Visualization in 3D , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[8]  Chris Stauffer,et al.  Automated multi-camera planar tracking correspondence modeling , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9]  R. Chellappa,et al.  Optimal Multi-View Fusion of Object Locations , 2008, 2008 IEEE Workshop on Motion and video Computing.

[10]  L. Wood,et al.  From the Authors , 2003, European Respiratory Journal.

[11]  Wayne H. Wolf,et al.  A real-time background subtraction method with camera motion compensation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[12]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[13]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[14]  Peter Cheeseman,et al.  On the Representation and Estimation of Spatial Uncertainty , 1986 .

[15]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[16]  Rama Chellappa,et al.  A system identification approach for video-based face recognition , 2004, ICPR 2004.

[17]  Thomas L. Clarke Distributed Interactive Simulation Systems for Simulation and Training in the Aerospace Environment , 1995 .

[18]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Shuvra S. Bhattacharyya,et al.  Model-based OpenMP implementation of a 3D facial pose tracking system , 2006, 2006 International Conference on Parallel Processing Workshops (ICPPW'06).

[20]  Sadegh Abbasi,et al.  Robust automatic selection of optimal views in multi-view free-form object recognition , 2005, Pattern Recognit..

[21]  Xenofon Koutsoukos,et al.  Optimal Discrete Rate Adaptation for Distributed Real-Time Systems , 2007, RTSS 2007.

[22]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[23]  Rama Chellappa,et al.  Structure From Planar Motion , 2006, IEEE Transactions on Image Processing.

[24]  Shuvra S. Bhattacharyya,et al.  An Energy-Driven Design Methodology for Distributing DSP Applications across Wireless Sensor Networks , 2007, 28th IEEE International Real-Time Systems Symposium (RTSS 2007).

[25]  Rama Chellappa,et al.  Robust two-camera tracking using homography , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  James Black,et al.  Multi view image surveillance and tracking , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[27]  Rama Chellappa,et al.  Visual tracking and recognition using appearance-adaptive models in particle filters , 2004, IEEE Transactions on Image Processing.

[28]  Petar M. Djuric,et al.  Resampling algorithms and architectures for distributed particle filters , 2005, IEEE Transactions on Signal Processing.

[29]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Alexandru Tupan,et al.  Triangulation , 1997, Comput. Vis. Image Underst..

[31]  Ankur Srivastava,et al.  Algorithmic and Architectural Optimizations for Computationally Efficient Particle Filtering , 2008, IEEE Transactions on Image Processing.

[32]  R. Chellappa,et al.  Recursive 3-D motion estimation from a monocular image sequence , 1990 .

[33]  Neil Genzlinger A. and Q , 2006 .

[34]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[35]  Max Lu,et al.  Acquiring Multi-Scale Images by Pan-Tilt-Zoom Control and Automatic Multi-Camera Calibration , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[36]  D HagerGregory,et al.  Efficient Region Tracking With Parametric Models of Geometry and Illumination , 1998 .

[37]  Marc Pollefeys,et al.  Pan-tilt-zoom camera calibration and high-resolution mosaic generation , 2006, Comput. Vis. Image Underst..

[38]  Rama Chellappa,et al.  Motion Based Correspondence for 3D Tracking of Multiple Dim Objects , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[39]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[40]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[42]  Maher Moakher,et al.  Means and Averaging in the Group of Rotations , 2002, SIAM J. Matrix Anal. Appl..

[43]  FuaPascal,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008 .

[44]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[45]  Mark Coates,et al.  Distributed particle filters for sensor networks , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[46]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[47]  Sadegh Abbasi,et al.  Automatic Selection of Optimal Views in Multi-view Object Recognition , 2000, BMVC.

[48]  Samuel R. Buss,et al.  Spherical averages and applications to spherical splines and interpolation , 2001, TOGS.

[49]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Olivier D. Faugeras,et al.  3D articulated models and multi-view tracking with silhouettes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[52]  金谷 健一 Statistical optimization for geometric computation : theory and practice , 2005 .

[53]  Peter C. Cheeseman,et al.  Estimating uncertain spatial relationships in robotics , 1986, Proceedings. 1987 IEEE International Conference on Robotics and Automation.

[54]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[55]  Dimitrios Makris,et al.  Bridging the gaps between cameras , 2004, CVPR 2004.

[56]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[57]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Volkan Cevher,et al.  Target Tracking Using a Joint Acoustic Video System , 2007, IEEE Transactions on Multimedia.

[59]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[60]  Rama Chellappa,et al.  Probabilistic recognition of human faces from video , 2002, Proceedings. International Conference on Image Processing.

[61]  Takeo Kanade,et al.  Algorithms for cooperative multisensor surveillance , 2001, Proc. IEEE.

[62]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[63]  Gian Luca Foresti,et al.  Distributed architectures and logical-task decomposition in multimedia surveillance systems , 2001, Proc. IEEE.

[64]  Rama Chellappa,et al.  Probabilistic recognition of human faces from video , 2002, Proceedings. International Conference on Image Processing.

[65]  Rama Chellappa,et al.  FINGERPRINTING VEHICLES FOR TRACKING ACROSS NON-OVERLAPPING VIEWS , 2006 .

[66]  Alex Pentland,et al.  Parametrized structure from motion for 3D adaptive feedback tracking of faces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[67]  Mubarak Shah,et al.  A Multiview Approach to Tracking People in Crowded Scenes Using a Planar Homography Constraint , 2006, ECCV.

[68]  Frank Dellaert,et al.  An MCMC-Based Particle Filter for Tracking Multiple Interacting Targets , 2004, ECCV.

[69]  W. Richards,et al.  Boundaries of Visual Motion , 1985 .

[70]  Y. Bar-Shalom Tracking and data association , 1988 .

[71]  Mikel D. Petty Computer-generated forces in distributed interactive simulation , 1995, Defense + Commercial Sensing.

[72]  L. Davis,et al.  M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene , 2003, International Journal of Computer Vision.

[73]  Haibin Ling,et al.  Deformation invariant image matching , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[74]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[75]  Larry S. Davis,et al.  Multi-camera Tracking and Segmentation of Occluded People on Ground Plane Using Search-Guided Particle Filtering , 2006, ECCV.

[76]  Gregory D. Hager,et al.  Efficient Region Tracking With Parametric Models of Geometry and Illumination , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[77]  R. Hartley Triangulation, Computer Vision and Image Understanding , 1997 .

[78]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[79]  L. Davis,et al.  Background and foreground modeling using nonparametric kernel density estimation for visual surveillance , 2002, Proc. IEEE.

[80]  Ming Xu,et al.  Tracking football players with multiple cameras , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[81]  Yaakov Bar-Shalom,et al.  Sonar tracking of multiple targets using joint probabilistic data association , 1983 .

[82]  A. Hampapur,et al.  Smart video surveillance: exploring the concept of multiscale spatiotemporal tracking , 2005, IEEE Signal Processing Magazine.

[83]  Rama Chellappa,et al.  Model driven segmentation and registration of articulating humans in Laplacian Eigenspace , 2006 .

[84]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[85]  Timothy F. Cootes,et al.  Face Recognition Using Active Appearance Models , 1998, ECCV.

[86]  Sharath Pankanti,et al.  Smart Video Surveillance , 2005 .

[87]  Rama Chellappa,et al.  3D Facial Pose Tracking in Uncalibrated Videos , 2005, PReMI.

[88]  Suya You,et al.  3D video surveillance with Augmented Virtual Environments , 2003, IWVS '03.

[89]  Qinfen Zheng,et al.  A temporal variance-based moving target detector , 2005 .