Geo-spatial aerial video processing for scene understanding and object tracking

This paper presents an approach to extracting and using semantic layers from low altitude aerial videos for scene understanding and object tracking. The input video is captured by low flying aerial platforms and typically consists of strong parallax from non-ground-plane structures. A key aspect of our approach is the use of geo-registration of video frames to reference image databases (such as those available from Terraserver and Google satellite imagery) to establish a geo-spatial coordinate system for pixels in the video. Geo-registration enables Euclidean 3D reconstruction with absolute scale unlike traditional monocular structure from motion where continuous scale estimation over long periods of time is an issue. Geo-registration also enables correlation of video data to other stored information sources such as GIS (geo-spatial information system) databases. In addition to the geo-registration and 3D reconstruction aspects, the key contributions of this paper include: (1) exploiting appearance and 3D shape constraints derived from geo-registered videos for labeling of structures such as buildings, foliage, and roads for scene understanding, and (2) elimination of moving object detection and tracking errors using 3D parallax constraints and semantic labels derived from geo-registered videos. Experimental results on extended time aerial video data demonstrates the qualitative and quantitative aspects of our work.

[1]  A. G. Amitha Perera,et al.  Moving Object Segmentation using Scene Understanding , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[2]  Matthieu Cord,et al.  3D Data Reconstruction and Modeling for Urban Scene Analysis , 2001 .

[3]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[4]  Vivek Verma,et al.  3D Building Detection and Modeling from Aerial LIDAR Data , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Harpreet S. Sawhney,et al.  Robust Video Mosaicing through Topology Inference and Local to Global Alignment , 1998, ECCV.

[6]  Mubarak Shah,et al.  Motion layer extraction in the presence of occlusion using graph cuts , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Claus Brenner,et al.  Extraction of buildings and trees in urban environments , 1999 .

[8]  A. G. Amitha Perera,et al.  Multi-Object Tracking Through Simultaneous Long Occlusions and Split-Merge Conditions , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10]  Gérard G. Medioni,et al.  Detection and tracking of moving objects from a moving platform in presence of strong parallax , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Richard Szeliski,et al.  Construction of Panoramic Image Mosaics with Global and Local Alignment , 2001 .

[12]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Harry Shum,et al.  Image segmentation by data driven Markov chain Monte Carlo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[14]  Hao Tang,et al.  Dynamic 3D Urban Scene Modeling Using Multiple Pushbroom Mosaics , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[15]  Mubarak Shah,et al.  Motion and Appearance Contexts for Tracking and Re-Acquiring Targets in Aerial Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Cordelia Schmid,et al.  AUTOMATIC LINE MATCHING AND 3D RECONSTRUCTION OF BUILDINGS FROM MULTIPLE VIEWS , 1999 .

[17]  Vladimir Kolmogorov,et al.  Multi-camera Scene Reconstruction via Graph Cuts , 2002, ECCV.

[18]  Harpreet S. Sawhney,et al.  Independent motion detection in 3D scenes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).