Foreground Regions Extraction and Characterization Towards Real-Time Object Tracking

Object localization and tracking are key issues in the analysis of scenes for video surveillance or scene understanding applications. This paper presents a contribution to the object tracking task in indoor environments surveyed by multiple fixed cameras. The method proposed uses a foreground separation process at each camera view. Then, a 3D-foreground scene is modeled and discretized into voxels making use of all the segmented views, preventing the difficulties of inter-object occlusions in 2D trackers, and increasing the robustness for not having to rely only in one view. The voxels are grouped into meaningful blobs, whose colors are modeled for tracking purposes, using a novel voxel-coloring technique that considers possible inter/intra-object occlusions. Finally, color information together with other characteristic features of 3D object appearances are temporally tracked using a template-based technique which takes into account all the features simultaneously in accordance with their respective variances. Extensive experiments dealing with several hours of video sequences in real-world scenarios have been conducted, showing a very promising performance.

[1]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Montse Pardàs,et al.  Robust Tracking and Object Classification Towards Automated Video Surveillance , 2004, ICIAR.

[3]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[4]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[5]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Montse Pardàs,et al.  Shadow removal with blob-based morphological reconstruction for error correction , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  James Black,et al.  Multi view image surveillance and tracking , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[8]  Montse Pardàs,et al.  Hierarchical representation of scenes using activity information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..