WHAT CAN CASUAL WALKERS TELL US ABOUT A 3 D SCENE ? By Diego Rother

An approach for incremental learning of a 3D scene from a single static video camera is presented in this paper. In particular, we exploit the presence of casual people walking in the scene to infer relative depth, learn shadows, and segment the critical ground structure. Considering that this type of video data is so ubiquitous, this work provides an important step towards 3D scene analysis from single cameras in readily available ordinary videos and movies. On-line 3D scene learning, as presented here, is very important for applications such as scene analysis, foreground refinement, tracking, biometrics, automated camera collaboration, activity analysis, identification, and real-time computer-graphics applications. The main contributions of this work are then two-fold. First, we use the people in the scene to continuously learn and update the 3D scene parameters using an incremental robust (L1) error minimization. Secondly, models of shadows in the scene are learned using a statistical framework. A symbiotic relationship between the shadow model and the estimated scene geometry is exploited towards incremental mutual improvement. We illustrate the effectiveness of the proposed framework with applications in foreground refinement, automatic segmentation as well as relative depth mapping of the floor/ground, and estimation of 3D trajectories of people in the scene.

[1]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[2]  Mohan M. Trivedi,et al.  Detecting Moving Shadows: Algorithms and Evaluation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Hassan Foroosh,et al.  Robust Auto-Calibration from Pedestrians , 2006, 2006 IEEE International Conference on Video and Signal Based Surveillance.

[4]  Fatih Murat Porikli,et al.  Shadow flow: a recursive method to learn moving cast shadows , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  Touradj Ebrahimi,et al.  Shadow identification and classification using invariant color models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[7]  Antonio Criminisi,et al.  Accurate Visual Metrology from Single and Multiple Uncalibrated Images , 2001, Distinguished Dissertations.

[8]  Nikos Paragios,et al.  Motion-based background subtraction using adaptive kernel density estimation , 2004, CVPR 2004.

[9]  Guillermo Sapiro,et al.  O(N) implementation of the fast marching algorithm , 2006, Journal of Computational Physics.

[10]  Paulo R. S. Mendonça,et al.  Bayesian autocalibration for surveillance , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.