3D Object Modeling and Recognition from Photographs and Image Sequences

This chapter proposes a representation of rigid three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patches under affine projection are combined with a normalized representation of their appearance to guide the matching process involved in object modeling and recognition tasks. The proposed approach is applied in two domains: (1) Photographs — models of rigid objects are constructed from small sets of images and recognized in highly cluttered shots taken from arbitrary viewpoints. (2) Video — dynamic scenes containing multiple moving objects are segmented into rigid components, and the resulting 3D models are directly matched to each other, giving a novel approach to video indexing and retrieval.

[1]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2]  Jiří Matas,et al.  Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[3]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[4]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[5]  Yehezkel Lamdan,et al.  Geometric Hashing: A General And Efficient Model-based Recognition Scheme , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Olivier D. Faugeras,et al.  HYPER: A New Approach for the Recognition and Positioning of Two-Dimensional Objects , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  John R. Kender,et al.  Video Summaries through Mosaic-Based Shot and Scene Clustering , 2002, ECCV.

[11]  O. Faugeras,et al.  The Geometry of Multiple Images , 1999 .

[12]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[13]  Tony Lindeberg,et al.  Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure , 1997, Image Vis. Comput..

[14]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[15]  Jan-Olof Eklundh,et al.  Computer Vision — ECCV '94 , 1994, Lecture Notes in Computer Science.

[16]  J J Koenderink,et al.  Affine structure from motion. , 1991, Journal of the Optical Society of America. A, Optics and image science.

[17]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[18]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[19]  Takeo Kanade,et al.  A Paraperspective Factorization Method for Shape and Motion Recovery , 1994, ECCV.

[20]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[21]  M. Hebert,et al.  The Representation, Recognition, and Locating of 3-D Objects , 1986 .

[22]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[23]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[24]  Jean Ponce,et al.  On Computing Metric Upgrades of Projective Reconstructions Under the Rectangular Pixel Assumption , 2000, SMILE.

[25]  Andrew Zisserman,et al.  Automated Scene Matching in Movies , 2002, CIVR.

[26]  Luc Van Gool,et al.  3D Structure from Images — SMILE 2000 , 2001, Lecture Notes in Computer Science.

[27]  Tony Lindeberg,et al.  Direct computation of shape cues using scale-adapted spatial derivative operators , 1996, International Journal of Computer Vision.

[28]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[29]  Daphna Weinshall,et al.  Linear and incremental acquisition of invariant shape models from image sequences , 1993, 1993 (4th) International Conference on Computer Vision.

[30]  Richard Szeliski,et al.  Vision Algorithms: Theory and Practice , 2002, Lecture Notes in Computer Science.

[31]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[32]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation by Image Exploration , 2004, ECCV.

[34]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[35]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[36]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[37]  Stefan Carlsson,et al.  Wide Baseline Point Matching Using Affine Invariants Computed from Intensity Profiles , 2000, ECCV.

[38]  Fred Rothganger 3 D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and MultiView Spatial Constraints , 2004 .

[39]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[40]  Andrew Zisserman,et al.  Applications of Invariance in Computer Vision , 1993, Lecture Notes in Computer Science.

[41]  Pietro Perona,et al.  Recognition by Probabilistic Hypothesis Construction , 2004, ECCV.

[42]  W. Eric L. Grimson,et al.  Localizing Overlapping Parts by Searching the Interpretation Tree , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Andrew W. Fitzgibbon,et al.  Multibody Structure and Motion: 3-D Reconstruction of Independently Moving Objects , 2000, ECCV.

[44]  David G. Lowe,et al.  The viewpoint consistency constraint , 2015, International Journal of Computer Vision.

[45]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[46]  J. Ponce,et al.  Segmenting, modeling, and matching video clips containing multiple moving objects , 2004, CVPR 2004.

[47]  Wei-Ying Ma,et al.  Image and Video Retrieval , 2003, Lecture Notes in Computer Science.

[48]  Yehezkel Lamdan,et al.  On the error analysis of 'geometric hashing' , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  J.B. Burns,et al.  View Variation of Point-Set and Line-Segment Features , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Martial Hebert,et al.  Minimum risk distance measure for object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[51]  Andrew Zisserman,et al.  Geometric invariance in computer vision , 1992 .

[52]  Christopher M. Bishop,et al.  Non-linear Bayesian Image Modelling , 2000, ECCV.

[53]  Tony Lindeberg,et al.  Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure , 1994, ECCV.

[54]  Martial Hebert,et al.  The optimal distance measure for object detection , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..