Object Detection in Multi-view 3D Reconstruction Using Semantic and Geometric Context

We present a method for object detection in a multi view 3D model. We use highly overlapping views, geometric data, and semantic surface classification in order to boost existing 2D algorithms. Specifically, a 3D model is computed from the overlapping views, and the model is segmented into semantic labels using height information, color and planar qualities. 2D detector is run on all images and then detections are mapped into 3D via the model. The detections are clustered in 3D and represented by 3D boxes. Finally, the detections, visibility maps and semantic labels are combined using a Support Vector Machine to achieve a more robust object detector.

[1]  Paul Newman,et al.  A generative framework for fast urban labeling using spatial and temporal context , 2009, Auton. Robots.

[2]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Michael Goesele,et al.  Multi-View Stereo Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, CVPR.

[5]  Dieter Fox,et al.  A Spatio-Temporal Probabilistic Model for Multi-Sensor Multi-Class Object Recognition , 2007, ISRR.

[6]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[8]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[9]  Horst Bischof,et al.  On-line boosting-based car detection from aerial images , 2008 .

[10]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[11]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[12]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[13]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[14]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[15]  Jan-Michael Frahm,et al.  Piecewise planar and non-planar stereo for urban scene reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Markus Gerke,et al.  The ISPRS benchmark on urban object classification and 3D building reconstruction , 2012 .

[18]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[19]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Horst Bischof,et al.  A 3D Teacher for Car Detection in Aerial Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Thomas Mauthner,et al.  Semantic Classification in Aerial Imagery by Integrating Appearance and Height Information , 2009, ACCV.

[22]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[23]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Horst Bischof,et al.  Recognizing cars in aerial imagery to improve orthophotos , 2007, GIS.