论文信息 - 3D Object Modeling and Recognition from Photographs and Image Sequences

3D Object Modeling and Recognition from Photographs and Image Sequences

This chapter proposes a representation of rigid three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patches under affine projection are combined with a normalized representation of their appearance to guide the matching process involved in object modeling and recognition tasks. The proposed approach is applied in two domains: (1) Photographs — models of rigid objects are constructed from small sets of images and recognized in highly cluttered shots taken from arbitrary viewpoints. (2) Video — dynamic scenes containing multiple moving objects are segmented into rigid components, and the resulting 3D models are directly matched to each other, giving a novel approach to video indexing and retrieval.

[1] Andrew Zisserman,et al. Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2] Jiří Matas,et al. Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[3] Andrew Zisserman,et al. MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[4] Tony Lindeberg,et al. Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[5] Yehezkel Lamdan,et al. Geometric Hashing: A General And Efficient Model-based Recognition Scheme , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[6] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7] C. Schmid,et al. Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8] Cordelia Schmid,et al. Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Olivier D. Faugeras,et al. HYPER: A New Approach for the Recognition and Positioning of Two-Dimensional Objects , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] John R. Kender,et al. Video Summaries through Mosaic-Based Shot and Scene Clustering , 2002, ECCV.

[11] O. Faugeras,et al. The Geometry of Multiple Images , 1999 .

[12] Andrew W. Fitzgibbon,et al. Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[13] Tony Lindeberg,et al. Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure , 1997, Image Vis. Comput..

[14] Adam Baumberg,et al. Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[15] Jan-Olof Eklundh,et al. Computer Vision — ECCV '94 , 1994, Lecture Notes in Computer Science.

[16] J J Koenderink,et al. Affine structure from motion. , 1991, Journal of the Optical Society of America. A, Optics and image science.

[17] Cordelia Schmid,et al. An Affine Invariant Interest Point Detector , 2002, ECCV.

[18] Mads Nielsen,et al. Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[19] Takeo Kanade,et al. A Paraperspective Factorization Method for Shape and Motion Recovery , 1994, ECCV.

[20] Takeo Kanade,et al. Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[21] M. Hebert,et al. The Representation, Recognition, and Locating of 3-D Objects , 1986 .

[22] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[23] M. Turk,et al. Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[24] Jean Ponce,et al. On Computing Metric Upgrades of Projective Reconstructions Under the Rectangular Pixel Assumption , 2000, SMILE.

[25] Andrew Zisserman,et al. Automated Scene Matching in Movies , 2002, CIVR.

[26] Luc Van Gool,et al. 3D Structure from Images — SMILE 2000 , 2001, Lecture Notes in Computer Science.

[27] Tony Lindeberg,et al. Direct computation of shape cues using scale-adapted spatial derivative operators , 1996, International Journal of Computer Vision.

[28] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[29] Daphna Weinshall,et al. Linear and incremental acquisition of invariant shape models from image sequences , 1993, 1993 (4th) International Conference on Computer Vision.

[30] Richard Szeliski,et al. Vision Algorithms: Theory and Practice , 2002, Lecture Notes in Computer Science.

[31] Minerva M. Yeung,et al. Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[32] C Tomasi,et al. Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[33] Luc Van Gool,et al. Simultaneous Object Recognition and Segmentation by Image Exploration , 2004, ECCV.

[34] Hiroshi Murase,et al. Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[35] Cordelia Schmid,et al. 3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[36] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[37] Stefan Carlsson,et al. Wide Baseline Point Matching Using Affine Invariants Computed from Intensity Profiles , 2000, ECCV.

[38] Fred Rothganger. 3 D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and MultiView Spatial Constraints , 2004 .

[39] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[40] Andrew Zisserman,et al. Applications of Invariance in Computer Vision , 1993, Lecture Notes in Computer Science.

[41] Pietro Perona,et al. Recognition by Probabilistic Hypothesis Construction , 2004, ECCV.

[42] W. Eric L. Grimson,et al. Localizing Overlapping Parts by Searching the Interpretation Tree , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Andrew W. Fitzgibbon,et al. Multibody Structure and Motion: 3-D Reconstruction of Independently Moving Objects , 2000, ECCV.

[44] David G. Lowe,et al. The viewpoint consistency constraint , 2015, International Journal of Computer Vision.

[45] Andrew Zisserman,et al. Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[46] J. Ponce,et al. Segmenting, modeling, and matching video clips containing multiple moving objects , 2004, CVPR 2004.

[47] Wei-Ying Ma,et al. Image and Video Retrieval , 2003, Lecture Notes in Computer Science.

[48] Yehezkel Lamdan,et al. On the error analysis of 'geometric hashing' , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49] J.B. Burns,et al. View Variation of Point-Set and Line-Segment Features , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[50] Martial Hebert,et al. Minimum risk distance measure for object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[51] Andrew Zisserman,et al. Geometric invariance in computer vision , 1992 .

[52] Christopher M. Bishop,et al. Non-linear Bayesian Image Modelling , 2000, ECCV.

[53] Tony Lindeberg,et al. Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure , 1994, ECCV.

[54] Martial Hebert,et al. The optimal distance measure for object detection , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..