Simultaneous Object Recognition and Segmentation by Image Exploration

Methods based on local, viewpoint invariant features have proven capable of recognizing objects in spite of viewpoint changes, occlusion and clutter. However, these approaches fail when these factors are too strong, due to the limited repeatability and discriminative power of the features. As additional shortcomings, the objects need to be rigid and only their approximate location is found. We present a novel Object Recognition approach which overcomes these limitations. An initial set of feature correspondences is first generated. The method anchors on it and then gradually explores the surrounding area, trying to construct more and more matching features, increasingly farther from the initial ones. The resulting process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. Only very few correct initial matches suffice for reliable recognition. The experimental results demonstrate the stronger power of the presented method in dealing with extensive clutter, dominant occlusion, large scale and viewpoint changes. Moreover non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. The approach can extend any viewpoint invariant feature extractor.

[1]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[2]  Philip H. S. Torr,et al.  The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix , 1997, International Journal of Computer Vision.

[3]  Ralph Gross,et al.  Concurrent Object Recognition and Segmentation by Graph Partitioning , 2002, NIPS.

[4]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[5]  Luc Van Gool,et al.  Wide-baseline multiple-view correspondences , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Andrea Salgian,et al.  A Perceptual Grouping Hierarchy for Appearance-Based 3D Object Recognition , 1999, Comput. Vis. Image Underst..

[7]  Luc Van Gool,et al.  Video shot characterization , 2004, Machine Vision and Applications.

[8]  Vittorio Ferrari,et al.  Affine invariant regions++ , 2004 .

[9]  Rachid Deriche,et al.  A Robust Technique for Matching two Uncalibrated Images Through the Recovery of the Unknown Epipolar Geometry , 1995, Artif. Intell..

[10]  Michael Georgiopoulos,et al.  Learning geometric hashing functions for model-based object recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[12]  Long Quan,et al.  Match Propagation for Image-Based Modeling and Rendering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Bernt Schiele,et al.  Scale-Invariant Object Categorization Using a Scale-Adaptive Mean-Shift Search , 2004, DAGM-Symposium.

[14]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[15]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Cordelia Schmid,et al.  Combining greyvalue invariants with local constraints for object recognition , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[18]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[19]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Andrew Zisserman,et al.  Automated Scene Matching in Movies , 2002, CIVR.

[21]  Cordelia Schmid,et al.  3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[22]  Tinne Tuytelaars,et al.  Integrating multiple model views for object recognition , 2004, CVPR 2004.

[23]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[24]  Jiri Matas,et al.  Epipolar Geometry from Two Correspondences , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[25]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[26]  Stefan Carlsson,et al.  Combining Appearance and Topology for Wide Baseline Matching , 2002, ECCV.

[27]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[28]  Reinhard Koch,et al.  Matching of affinely invariant regions for visual servoing , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[29]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[30]  Stepán Obdrzálek,et al.  Object Recognition using Local Affine Frames on Distinguished Regions , 2002, BMVC.

[31]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[32]  B. Kimia,et al.  3D object recognition using shape similiarity-based aspect graph , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.