Noname manuscript No. (will be inserted by the editor) Learning Real-Time Perspective Patch Rectification

We propose two learning-based methods to patch rectification that are faster and more reliable than state-of-the-art affine region detection methods. Given a reference view of a patch, they can quickly recognize it in new views and accurately estimate the homography between the reference view and the new view. Our methods are more memory-consuming than affine region detectors, and are in practice currently limited to a few tens of patches. However, if the reference image is a fronto-parallel view and the internal parameters known, one single patch is often enough to precisely estimate an object pose. As a result, we can deal in real-time with objects that are significantly less textured than the ones required by state-of-the-art methods.The first method favors fast run-time performance while the second one is designed for fast real-time learning and robustness. However, they follow the same general approach: First, a classifier provides for every keypoint a first estimate of its transformation. Then, the estimate allows carrying out an accurate perspective rectification using linear predictors. The last step is a fast verification—made possible by the accurate perspective rectification—of the patch identity and its sub-pixel precision position estimation. We demonstrate the advantages of our approach on real-time 3D object detection and tracking applications.

[1]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  T. Kanade,et al.  Parameterizing Homographies CMU-RI-TR-06-11 , 2006 .

[3]  Ian D. Reid,et al.  Real-Time SLAM Relocalisation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Pascal Fua,et al.  Surface Deformation Models for Nonrigid 3D Shape Recovery , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jiri Matas,et al.  Geometric Hashing with Local Affine Frames , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[7]  Vincent Lepetit,et al.  Online learning of patch perspective rectification for efficient object detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Horst Bischof,et al.  Learning Features for Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[10]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[11]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[13]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[15]  Bill Triggs,et al.  Detecting Keypoints with Stable Position, Orientation, and Scale under Illumination Changes , 2004, ECCV.

[16]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Tom Drummond,et al.  Multiple Target Localisation at over 100 FPS , 2009, BMVC.

[18]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[21]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[23]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[24]  Vincent Lepetit,et al.  Real-time learning of accurate patch rectification , 2009, CVPR.

[25]  Michel Dhome,et al.  Hyperplane Approximation for Template Matching , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[27]  Selim Benhimane,et al.  Homography-based 2D Visual Tracking and Servoing , 2007, Int. J. Robotics Res..

[28]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[29]  Vincent Lepetit,et al.  Feature Harvesting for Tracking-by-Detection , 2006, ECCV.

[30]  Luc Van Gool,et al.  Fast wide baseline matching for visual navigation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[31]  Ian D. Reid,et al.  Locally Planar Patch Features for Real-Time Structure from Motion , 2004, BMVC.