Affine invariant regions++

Many problems in Computer Vision require the computation of correspondences between images. In order to cope with large differences in viewing conditions, several affine invariant region detectors have been developed in the last few years. These regions automatically adapt their shape so as to cover the same scene surface in any view. This dissertation builds upon existing detectors and develops various novel techniques which extend the power and functionalities of the regions. The advances relate to different subfields of Vision, and can be summarized as four main contributions. First and foremost, the thesis presents a powerful Object Recognition system capable of working with large amounts of background clutter, severe occlusion, and strong viewpoint and scale changes. It can handle non-rigid deformations, and also finds the contours of the visible parts of the object. The second innovation consists of a method to obtain region correspondences across several images taken from different viewpoints. These multi-view correspondences are important as they enable the automatic reconstruction of a 3D model given only a few still images. In constrast, traditionally this task requires a complete video as input. Another branch of the thesis introduces a real-time algorithm which tracks the full affine shape of a region as it evolves through a video, and its application for markerless Augmented Reality. Most prior works instead rely on adding special markers to the scene. Lastly, a technique to automatically find groups of regions correspondences lying on planar surfaces is presented. This allow to detect planar scene structures and their geometric transformation between views, which in turn can considerably simplify 3D reconstruction procedures, and is useful for robot navigation. Acknowledgements Four years is a long period of time. During this period I have met a number of wonderful people who influenced my work in various ways. First and foremost, I extend my deepest gratitude to Dr. Tinne Tuytelaars, who was always next to me, even thought a thousand kilometers divided our working places. Throughout the whole PhD, her brilliant intellectual support was second only to her amazing capacities to keep up a challenging and exciting working atmosphere, savour the successes, and react positively to defeats. My heartfelt thanks go to my supervisor, Prof. Luc Van Gool, whose enormous drive was a true inspiration. His expert advice, extensive knowledge of the Computer Vision field, and continuous incentives to improve my work were invaluable. I am grateful to my co-referee, Prof. …

[1]  Jun Rekimoto,et al.  CyberCode: designing augmented reality environments with visual tags , 2000, DARE '00.

[2]  Fredrik Kahl,et al.  Multiview reconstruction of space curves , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Luc Van Gool,et al.  Video shot characterization , 2004, Machine Vision and Applications.

[4]  George W. Evans,et al.  An Introduction to Linear Programming and The Theory of Games , 1966 .

[5]  Cordelia Schmid,et al.  3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[8]  R. Graham,et al.  Handbook of Combinatorics , 1995 .

[9]  Rachid Deriche,et al.  Region tracking through image sequences , 1995, Proceedings of IEEE International Conference on Computer Vision.

[10]  Markerless 3D Augmented Reality , 2003 .

[11]  Andrew Blake,et al.  Quantitative planar region detection , 2004, International Journal of Computer Vision.

[12]  Jean Ponce,et al.  Probabilistic 3D object recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[13]  Nicol N. Schraudolph,et al.  3D hand tracking by rapid stochastic gradient descent using a skinning model , 2004 .

[14]  Rajeev Sharma,et al.  Appearance management and cue fusion for 3D model-based tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[15]  Cordelia Schmid,et al.  A structured probabilistic model for recognition , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[16]  Vincent Lepetit,et al.  Fully automated and stable registration for augmented reality applications , 2003, The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings..

[17]  Philip H. S. Torr,et al.  The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix , 1997, International Journal of Computer Vision.

[18]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  Rachid Deriche,et al.  A Robust Technique for Matching two Uncalibrated Images Through the Recovery of the Unknown Epipolar Geometry , 1995, Artif. Intell..

[20]  Anders Heyden,et al.  Euclidean reconstruction from image sequences with varying and unknown focal length and principal point , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[22]  Luc Van Gool,et al.  Dense matching of multiple wide-baseline views , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Patrick Pérez,et al.  Color-Based Probabilistic Tracking , 2002, ECCV.

[25]  Shan Lu,et al.  Using multiple cues for hand tracking and model refinement , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[26]  Reinhard Koch,et al.  Matching of affinely invariant regions for visual servoing , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[27]  PaperNo Recognition of shapes by editing shock graphs , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[28]  Klaus Dorfmüller,et al.  Robust tracking for augmented reality using retroreflective markers , 1999, Comput. Graph..

[29]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[30]  Cordelia Schmid,et al.  A sparse texture representation using affine-invariant regions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[31]  Reinhard Koch,et al.  Self-Calibration and Metric Reconstruction Inspite of Varying and Unknown Intrinsic Camera Parameters , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[32]  .. McGlone,et al.  Vision-Based Object Registration for Real-Time Image Overlay , 1995 .

[33]  Michal Irani,et al.  Computing occluding and transparent motions , 1994, International Journal of Computer Vision.

[34]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Desney S. Tan,et al.  The best of two worlds: merging virtual and real for face to face collaboration , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[36]  Cordelia Schmid,et al.  Combining greyvalue invariants with local constraints for object recognition , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Jiri Matas,et al.  Learning Parameters of a Recognition System Based on Local Affine Frames , 2002 .

[38]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[39]  Stanley T. Birchfield,et al.  Elliptical head tracking using intensity gradients and color histograms , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[40]  Yakup Genc,et al.  Marker-less tracking for AR: a learning-based approach , 2002, Proceedings. International Symposium on Mixed and Augmented Reality.

[41]  Pascal Fua,et al.  A parallel stereo algorithm that produces dense depth maps and preserves image features , 1993, Machine Vision and Applications.

[42]  Manolis I. A. Lourakis,et al.  Detecting Planes In An Uncalibrated Image Pair , 2002, BMVC.

[43]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[44]  P. Torr Geometric motion segmentation and model selection , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[45]  Wolfgang Broll,et al.  The virtual round table - a collaborative augmented multi-user environment , 2000, CVE '00.

[46]  Ramesh C. Jain,et al.  Using Dynamic Programming for Solving Variational Problems in Vision , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Stepán Obdrzálek,et al.  Local affine frames for wide-baseline stereo , 2002, Object recognition supported by user interaction for service robots.

[48]  Martial Hebert,et al.  Iterative projective reconstruction from multiple views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[49]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[50]  Hideyuki Tamura,et al.  AR/sup 2/Hockey: a case study of collaborative augmented reality , 1998, Proceedings. IEEE 1998 Virtual Reality Annual International Symposium (Cat. No.98CB36180).

[51]  Daphna Weinshall,et al.  From Reference Frames to Reference Planes: Multi-View Parallax Geometry and Applications , 1998, ECCV.

[52]  Éric Marchand,et al.  A real-time tracker for markerless augmented reality , 2003, The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings..

[53]  Robert T. Collins,et al.  Mean-shift blob tracking through scale space , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[54]  Michael Spann,et al.  A new approach to clustering , 1990, Pattern Recognit..

[55]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[56]  René Vidal,et al.  Structure from Planar Motions with Small Baselines , 2002, ECCV.

[57]  Yakup Genc,et al.  Fast algorithms for projective multi-frame structure from motion , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[58]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[59]  David G. Lowe,et al.  Indexing without Invariants in 3D Object Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Cordelia Schmid,et al.  Automatic line matching across views , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[61]  Benjamin B. Kimia,et al.  3D Object Recognition Using Shape Similarity-Based Aspect Graph , 2001, ICCV.

[62]  Michael Werman,et al.  On View Likelihood and Stability , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Naonori Ueda,et al.  Tracking Moving Contours Using Energy-Minimizing Elastic Contour Models , 1992, ECCV.

[64]  Cordelia Schmid,et al.  Shape recognition with edge-based features , 2003, BMVC.

[65]  Stepán Obdrzálek,et al.  Object Recognition using Local Affine Frames on Distinguished Regions , 2002, BMVC.

[66]  Bastian Leibe,et al.  Interleaved Object Categorization and Segmentation , 2003, BMVC.

[67]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[68]  Naokazu Yokoya,et al.  A Stereo Vision-based Mixed Reality System with Natural Feature Point Tracking , 2004 .

[69]  Luc Van Gool,et al.  Retrieving objects from videos based on affine regions , 2004, 2004 12th European Signal Processing Conference.

[70]  Luc Van Gool,et al.  Real-time affine region tracking and coplanar grouping , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[71]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[72]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[73]  David G. Lowe,et al.  Fitting Parameterized Three-Dimensional Models to Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[74]  Luc Van Gool,et al.  Wide-baseline multiple-view correspondences , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[75]  Jean Ponce,et al.  Automatic model construction, pose estimation, and object recognition from photographs using triangular splines , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[76]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[77]  Stefan Carlsson,et al.  Combining Appearance and Topology for Wide Baseline Matching , 2002, ECCV.

[78]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[79]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation by Image Exploration , 2004, ECCV.

[80]  Andrea Salgian,et al.  A Perceptual Grouping Hierarchy for Appearance-Based 3D Object Recognition , 1999, Comput. Vis. Image Underst..

[81]  Cordelia Schmid,et al.  The Geometry and Matching of Curves in Multiple Views , 1998, ECCV.

[82]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[83]  Andrew Zisserman,et al.  Object Level Grouping for Video Shots , 2004, International Journal of Computer Vision.

[84]  Ralph Gross,et al.  Concurrent Object Recognition and Segmentation by Graph Partitioning , 2002, NIPS.

[85]  Stefan Carlsson,et al.  Combinatorial Geometry for Shape Representation and Indexing , 1996, Object Representation in Computer Vision.

[86]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[87]  Andrew Zisserman,et al.  Automated Scene Matching in Movies , 2002, CIVR.

[88]  Tinne Tuytelaars,et al.  Integrating multiple model views for object recognition , 2004, CVPR 2004.

[89]  Lihi Zelnik-Manor,et al.  Multi-Frame Estimation of Planar Motion , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[90]  Michel Dhome,et al.  A simple and efficient template matching algorithm , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[91]  Adrian David Cheok,et al.  Online 6 DOF augmented reality registration from natural features , 2002, Proceedings. International Symposium on Mixed and Augmented Reality.

[92]  Ingemar J. Cox,et al.  A maximum-flow formulation of the N-camera stereo correspondence problem , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[93]  Ian D. Reid,et al.  Duality, Rigidity and Planar Parallax , 1998, ECCV.

[94]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[95]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[96]  Stefan Carlsson,et al.  Wide Baseline Point Matching Using Affine Invariants Computed from Intensity Profiles , 2000, ECCV.

[97]  Jiri Matas,et al.  Colour Image Retrieval and Object Recognition Using the Multimodal Neighbourhood Signature , 2000, ECCV.

[98]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[99]  David G. Lowe,et al.  Learning object recognition models from images , 1993, 1993 (4th) International Conference on Computer Vision.

[100]  Stefano Soatto,et al.  Real-time feature tracking and outlier rejection with changes in illumination , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[101]  Yuan-Fang Wang,et al.  Real-time multiperson tracking in video surveillance , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[102]  Cordelia Schmid,et al.  Comparing and evaluating interest points , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[103]  J. Canny A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  L. Van Gool,et al.  Analyzing the layout of composite textures , 2002 .

[105]  A. Murat Tekalp,et al.  Simultaneous alpha map generation and 2-D mesh tracking for multimedia applications , 1997, Proceedings of International Conference on Image Processing.

[106]  Axel Pinz,et al.  A new optical tracking system for virtual and augmented reality applications , 2001, IMTC 2001. Proceedings of the 18th IEEE Instrumentation and Measurement Technology Conference. Rediscovering Measurement in the Age of Informatics (Cat. No.01CH 37188).

[107]  Venkataraman Sundareswaran,et al.  Visual servoing-based augmented reality , 1999 .

[108]  Jan Wieghardt,et al.  Learning the Topology of Object Views , 2002, ECCV.

[109]  Haifeng Chen,et al.  Robust regression with projection based M-estimators , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[110]  Katsuhiko Sakaue,et al.  Real-time camera parameter estimation from images for a mixed reality system , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[111]  Richard Szeliski,et al.  3-D Scene Data Recovery Using Omnidirectional Multibaseline Stereo , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[112]  Adrien Bartoli,et al.  Piecewise planar segmentation for automatic scene modeling , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[113]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[114]  W. A. Ho,et al.  Fusion of data from head-mounted and fixed sensors , 1998 .

[115]  Matthew A. Brown,et al.  Recognising panoramas , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[116]  Carsten Rother,et al.  Linear Multi View Reconstruction and Camera Recovery , 2001, ICCV.

[117]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[118]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[119]  Mubarak Shah,et al.  Two-frame wide baseline matching , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[120]  Li-Te Cheng,et al.  Dealing with speed and robustness issues for video-based registration on a wearable computing platform , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[121]  O. Chum,et al.  Epipolar Geometry from Three Correspondences , 2003 .

[122]  Michael Georgiopoulos,et al.  Learning geometric hashing functions for model-based object recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[123]  J. Koenderink,et al.  The internal representation of solid shape with respect to vision , 1979, Biological Cybernetics.

[124]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[125]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[126]  Luc Van Gool,et al.  Noncombinatorial Detection of Regular Repetitions under Perspective Skew , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[127]  David J. Kriegman,et al.  Invariant-Based Recognition of Complex Curved 3D Objects from Image Contours , 1998, Comput. Vis. Image Underst..

[128]  T. Ronald,et al.  Azuma A Survey of Augmented Reality , 2022 .

[129]  Atsushi Imiya,et al.  Voting method for planarity and motion detection , 1999, Image Vis. Comput..

[130]  Andrew W. Fitzgibbon,et al.  Markerless tracking using planar structures in the scene , 2000, Proceedings IEEE and ACM International Symposium on Augmented Reality (ISAR 2000).

[131]  Pascal Fua,et al.  Reconstructing complex surfaces from multiple stereo views , 1995, Proceedings of IEEE International Conference on Computer Vision.

[132]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[133]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[134]  Luc Van Gool,et al.  Markerless augmented reality with a real-time affine region tracker , 2001, Proceedings IEEE and ACM International Symposium on Augmented Reality.