Learning multi-planar scene models in multi-camera videos

Many man-made environments are constructed with multiple levels where people walk, joined by stairs, ramps and overpasses. This study proposes a novel method to learn the geometry of a scene containing more than a single ground plane by tracking pedestrians and combining information from multiple views. The method estimates a scene model with multiple planes by measuring the variation of pedestrian heights across each camera's field of view. It segments the image into separate plane regions, estimating the relative depth and altitude for each image pixel, thus building a three-dimensional reconstruction of the scene. By estimating the multiple planes, the method enables tracking algorithms to follow objects (pedestrians and/or vehicles) that are moving on different ground planes in the scene. The authors also introduce what they believe is the first public dataset with pedestrian traffic on multiple planes to encourage other researchers to compare their work in this field.

[1]  James Orwell,et al.  Learning Surveillance Tracking Models for the Self-Calibrated Ground Plane , 2002, BMVC.

[2]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[3]  Guillermo Sapiro,et al.  What Can Casual Walkers Tell Us About A 3D Scene? , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Ramakant Nevatia,et al.  Camera calibration from video of a walking human , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Dimitrios Makris,et al.  Learning Non-coplanar Scene Models by Exploring the Height Variation of Tracked Objects , 2010, ACCV.

[6]  Nicoletta Noceti,et al.  What Epipolar Geometry Can Do for Video-Surveillance , 2013, ICIAP.

[7]  Paulo R. S. Mendonça,et al.  Autocalibration from Tracks of Walking People , 2006, BMVC.

[8]  Mubarak Shah,et al.  Consistent Labeling of Tracked Objects in Multiple Cameras with Overlapping Fields of View , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Edmond Boyer,et al.  Camera calibration and 3D reconstruction from single images using parallelepipeds , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[11]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[13]  Tim J. Ellis,et al.  Multi camera image tracking , 2006, Image Vis. Comput..

[14]  Jiri Matas,et al.  Randomized RANSAC with Td, d test , 2004, Image Vis. Comput..

[15]  Massimo Piccardi,et al.  Tracking people across disjoint camera views by an illumination-tolerant appearance representation , 2007, Machine Vision and Applications.

[16]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Ashutosh Saxena,et al.  Make3D: Learning 3D Scene Structure from a Single Still Image , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Alexei A. Efros,et al.  People Watching: Human Actions as a Cue for Single View Geometry , 2012, ECCV.

[19]  Dimitrios Makris,et al.  Quantitative evaluation of different aspects of motion trackers under various challenges , 2010 .

[20]  Luc Van Gool,et al.  Probabilistic Parameter Selection for Learning Scene Structure from Video , 2008, BMVC.

[21]  Tianzi Jiang,et al.  A novel pixon-representation for image segmentation based on Markov random field , 2008, Image Vis. Comput..