Size Does Matter: Improving Object Recognition and 3D Reconstruction with Cross-Media Analysis of Image Clusters

Most of the recent work on image-based object recognition and 3D reconstruction has focused on improving the underlying algorithms. In this paper we present a method to automatically improve the quality of the reference database, which, as we will show, also affects recognition and reconstruction performances significantly. Starting out from a reference database of clustered images we expand small clusters. This is done by exploiting cross-media information, which allows for crawling of additional images. For large clusters redundant information is removed by scene analysis. We show how these techniques make object recognition and 3D reconstruction both more efficient and more precise - we observed up to 14.8% improvement for the recognition task. Furthermore, the methods are completely data-driven and fully automatic.

[1]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[4]  Yang Song,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[6]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Maarten Vergauwen,et al.  Web-based 3D Reconstruction Service , 2006, Machine Vision and Applications.

[8]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[9]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[10]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[13]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[15]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Trevor Darrell,et al.  Autotagging Facebook: Social network context improves photo annotation , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[20]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[21]  Andrew Zisserman,et al.  Object Mining Using a Matching Graph on Very Large Image Collections , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[22]  Luc Van Gool,et al.  Mining from large image sets , 2009, CIVR '09.

[23]  Luc Van Gool,et al.  I know what you did last summer: object-level auto-annotation of holiday snaps , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[25]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[26]  Richard Szeliski,et al.  Skeletal graphs for efficient structure from motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Ondřej Chum,et al.  Web Scale Image Clustering Large Scale Discovery of Spatially Related Images , 2008 .

[28]  Axel Pinz,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[29]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .