Samantha: Towards Automatic Image-Based Model Acquisition

In this paper we describe SAMANTHA, a Structure and Motion pipeline from images which is both more robust and computationally cheaper than current competing approaches. Pictures are organized into a hierarchical tree which has single images as leaves and partial reconstructions as internal nodes. The method proceeds bottom up until it reaches the root node, corresponding to the final result. This framework is one order of magnitude faster than sequential approaches, inherently parallel, less sensitive to the error accumulation causing drift and truly uncalibrated, not needing EXIF metadata to be present in pictures. We have verified the quality of our reconstructions both qualitatively producing compelling point clouds and quantitatively, comparing them with laser scans serving as ground truth. We also show how to automatically extract a meaningful collection of planar patches obtaining a compact, stable representation of scenes.

[1]  Tomás Pajdla,et al.  The geometric error for homographies , 2003, Comput. Vis. Image Underst..

[2]  Andrew W. Fitzgibbon,et al.  Automatic Camera Recovery for Closed or Open Image Sequences , 1998, ECCV.

[3]  Takeo Kanade,et al.  Image-consistent surface triangulation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[5]  Philip H. S. Torr An assessment of information criteria for motion model selection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Toby Howard,et al.  Accurate camera calibration for off-line, video-based augmented reality , 2002, Proceedings. International Symposium on Mixed and Augmented Reality.

[7]  Frank Dellaert,et al.  Out-of-Core Bundle Adjustment for Large-Scale 3D Reconstruction , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Olivier D. Faugeras,et al.  The fundamental matrix: Theory, algorithms, and stability analysis , 2004, International Journal of Computer Vision.

[9]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Richard Szeliski,et al.  Reconstructing building interiors from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Andrea Fusiello,et al.  Improving the efficiency of hierarchical structure-and-motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Luc Van Gool,et al.  3D Urban Scene Modeling Integrating Recognition and Reconstruction , 2008, International Journal of Computer Vision.

[13]  Richard Szeliski,et al.  Skeletal graphs for efficient structure from motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Andrea Fusiello,et al.  Robust Multiple Structures Estimation with J-Linkage , 2008, ECCV.

[15]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[16]  Andrea Fusiello,et al.  Practical Autocalibration , 2010, ECCV.

[17]  David Nistér,et al.  Reconstruction from Uncalibrated Sequences with a Hierarchy of Trifocal Tensors , 2000, ECCV.

[18]  Stepán Obdrzálek,et al.  3D Geometry from Uncalibrated Images , 2006, ISVC.

[19]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[20]  Horst Bischof,et al.  Towards Wiki-based Dense City Modeling , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22]  Harry Shum,et al.  Efficient bundle adjustment with virtual key frames: a hierarchical approach to multi-frame structure from motion , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[23]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[24]  Frank Dellaert,et al.  Spectral partitioning for structure from motion , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[26]  Thorsten Thormählen,et al.  Keyframe Selection for Camera Motion and Structure Estimation from Multiple Views , 2004, ECCV.

[27]  Alexandru Tupan,et al.  Triangulation , 1997, Comput. Vis. Image Underst..

[28]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[29]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[30]  Adrien Bartoli,et al.  A random sampling strategy for piecewise planar scene segmentation , 2007, Comput. Vis. Image Underst..

[31]  B. S. Manjunath,et al.  Computational methods for automatic image registration , 2007 .

[32]  Andrea Fusiello,et al.  Structure-and-motion pipeline on a hierarchical cluster tree , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[33]  Maarten Vergauwen,et al.  Web-based 3D Reconstruction Service , 2006, Machine Vision and Applications.

[34]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[35]  Andrea Fusiello,et al.  Photo-Consistent Planar Patches from Unstructured Cloud of Points , 2010, ECCV.

[36]  Matthew A. Brown,et al.  Unsupervised 3D object recognition and reconstruction in unordered datasets , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[37]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[38]  Luc Van Gool,et al.  Surviving Dominant Planes in Uncalibrated Structure and Motion Recovery , 2002, ECCV.

[39]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.