Correspondence-free multi-camera activity analysis and scene modeling

We propose a novel approach for activity analysis in multiple synchronized but uncalibrated static camera views. We assume that the topology of camera views is unknown and quite arbitrary, the fields of views covered by these cameras may have no overlap or any amount of overlap, and objects may move on different ground planes. Using low-level cues, objects are tracked in each of the camera views independently, and the positions and velocities of objects along trajectories are computed as features. Under a generative model, our approach jointly learns the distribution of an activity in the feature spaces of different camera views. It accomplishes two tasks: (1) grouping trajectories in different camera views belonging to the same activity into one cluster; (2) modeling paths commonly taken by objects across camera views. To our knowledge, no prior result of co-clustering trajectories in multiple camera views has been published. Advantages of this approach are that it does not require first solving the challenging correspondence problem, and the learning is unsupervised. Our approach is evaluated on two very large data sets with 22, 951 and 14, 985 trajectories.

[1]  Mubarak Shah,et al.  Appearance modeling for tracking in multiple non-overlapping cameras , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  W. Eric L. Grimson,et al.  Learning Semantic Scene Models by Trajectory Analysis , 2006, ECCV.

[3]  Anthony G. Cohn,et al.  Generation of Semantic Regions from Image Sequences , 1996, ECCV.

[4]  Richard I. Hartley,et al.  Person Reidentification Using Spatiotemporal Appearance , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Hassan Foroosh,et al.  Trajectory Rectification and Path Modeling for Video Surveillance , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Mubarak Shah,et al.  Tracking across multiple cameras with disjoint views , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Trevor Darrell,et al.  Simultaneous calibration and tracking with a network of non-overlapping sensors , 2004, CVPR 2004.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Dimitrios Makris,et al.  Bridging the gaps between cameras , 2004, CVPR 2004.

[10]  Tieniu Tan,et al.  Comparison of Similarity Measures for Trajectory Clustering in Outdoor Surveillance Scenes , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[11]  Chris Stauffer,et al.  Automated multi-camera planar tracking correspondence modeling , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  W. Eric L. Grimson,et al.  Inference of non-overlapping camera network topology by measuring statistical dependence , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Lily Lee,et al.  Monitoring Activities from Multiple Video Streams: Establishing a Common Coordinate Frame , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Tim J. Ellis,et al.  Automatic learning of an activity-based semantic scene model , 2003, Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance, 2003..

[15]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16]  Tim J. Ellis,et al.  Path detection in video surveillance , 2002, Image Vis. Comput..

[17]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..