Point matching as a classification problem for fast and robust object pose estimation

We propose a novel approach to point matching under large viewpoint and illumination changes that are suitable for accurate object pose estimation at a much lower computational cost than state-of-the-art methods. Most of these methods rely either on using ad hoc local descriptors or on estimating local affine deformations. By contrast, we treat wide baseline matching of key points as a classification problem, in which each class corresponds to the set of all possible views of such a point. Given one or more images of a target object, we train the system by synthesizing a large number of views of individual key points and by using statistical classification tools to produce a compact description of this view set. At run-time, we rely on this description to decide to which class, if any, an observed feature belongs. This formulation allows us to use a classification method to reduce matching error rates, and to move some of the computational burden from matching to training, which can be performed beforehand. In the context of pose estimation, we present experimental results for both planar and non-planar objects in the presence of occlusions, illumination changes, and cluttered backgrounds. We show that the method is both reliable and suitable for initializing real-time applications.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Luc Van Gool,et al.  Recognizing color patterns irrespective of viewpoint and illumination , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[6]  Larry S. Davis,et al.  Model-based object pose in 25 lines of code , 1992, International Journal of Computer Vision.

[7]  Yali Amit,et al.  Joint Induction of Shape Features and Tree Classifiers , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Frédéric Jurie,et al.  Solution of the Simultaneous Pose and Correspondence Problem Using Gaussian Error Model , 1999, Comput. Vis. Image Underst..

[9]  Cordelia Schmid,et al.  3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[10]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[11]  Vincent Lepetit,et al.  Fusing online and offline information for stable 3D tracking in real-time , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Cordelia Schmid,et al.  IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2004, Washington, DC, USA, June 27 - July 2, 2004 , 2004, CVPR Workshops.

[14]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[15]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[17]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[18]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[19]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[20]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[21]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[22]  Michel Dhome,et al.  Recognition of 3D textured objects by mixing view-based and model-based representations , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[23]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[24]  Hiroshi Murase,et al.  Real-time 100 object recognition system , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[25]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[26]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.