3D Affine: An Embedding of Local Image Features for Viewpoint Invariance Using RGB-D Sensor Data

Local image features are invariant to in-plane rotations and robust to minor viewpoint changes. However, the current detectors and descriptors for local image features fail to accommodate out-of-plane rotations larger than 25°–30°. Invariance to such viewpoint changes is essential for numerous applications, including wide baseline matching, 6D pose estimation, and object reconstruction. In this study, we present a general embedding that wraps a detector/descriptor pair in order to increase viewpoint invariance by exploiting input depth maps. The proposed embedding locates smooth surfaces within the input RGB-D images and projects them into a viewpoint invariant representation, enabling the detection and description of more viewpoint invariant features. Our embedding can be utilized with different combinations of descriptor/detector pairs, according to the desired application. Using synthetic and real-world objects, we evaluated the viewpoint invariance of various detectors and descriptors, for both standalone and embedded approaches. While standalone local image features fail to accommodate average viewpoint changes beyond 33.3°, our proposed embedding boosted the viewpoint invariance to different levels, depending on the scene geometry. Objects with distinct surface discontinuities were on average invariant up to 52.8°, and the overall average for all evaluated datasets was 45.4°. Similarly, out of a total of 140 combinations involving 20 local image features and various objects with distinct surface discontinuities, only a single standalone local image feature exceeded the goal of 60° viewpoint difference in just two combinations, as compared with 19 different local image features succeeding in 73 combinations when wrapped in the proposed embedding. Furthermore, the proposed approach operates robustly in the presence of input depth noise, even that of low-cost commodity depth sensors, and well beyond.

[1]  Fernando De la Torre,et al.  Factorized Graph Matching , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[3]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[4]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[5]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[7]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[9]  Vincent Lepetit,et al.  Learning to Assign Orientations to Feature Points , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Gang Hua,et al.  Discriminant Embedding for Local Image Descriptors , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  George Drettakis,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[14]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Wei Li,et al.  Fully affine invariant SURF for image matching , 2012, Neurocomputing.

[16]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[17]  Naixue Xiong,et al.  A FAST-BRISK Feature Detector with Depth Information , 2018, Sensors.

[18]  T. Rabbani,et al.  SEGMENTATION OF POINT CLOUDS USING SMOOTHNESS CONSTRAINT , 2006 .

[19]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[20]  Andrea Marchetti,et al.  Optimal RANSAC-Towards a Repeatable Algorithm for Finding the Optimal Set , 2013, J. WSCG.

[21]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[22]  M. R. Brito,et al.  Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection , 1997 .

[23]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[24]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Giuseppe Valenzise,et al.  Improving distinctiveness of brisk features using depth maps , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[26]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[27]  Jan-Michael Frahm,et al.  3D model matching with Viewpoint-Invariant Patches (VIP) , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Kok-Lim Low Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration , 2004 .

[29]  David Nistér,et al.  Linear Time Maximally Stable Extremal Regions , 2008, ECCV.

[30]  Gaël Varoquaux,et al.  Mayavi: 3D Visualization of Scientific Data , 2010, Computing in Science & Engineering.

[31]  Ian D. Reid,et al.  Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image , 2018, ArXiv.

[32]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[33]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Andriy Myronenko,et al.  Point Set Registration: Coherent Point Drift , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Tal Hassner,et al.  LATCH: Learned arrangements of three patch codes , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[36]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[37]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Pascal Fua,et al.  Learning to Match Aerial Images with Deep Attentive Architectures , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[41]  Gregory D. Hager,et al.  A Unified Framework for Multi-View Multi-Class Object Pose Estimation , 2018, ECCV.

[42]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[43]  Darius Burschka,et al.  Adaptive and Generic Corner Detection Based on the Accelerated Segment Test , 2010, ECCV.

[44]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[45]  Brendan McCane,et al.  Better than SIFT? , 2015, Machine Vision and Applications.

[46]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[47]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[48]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[49]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[52]  Pieter Abbeel,et al.  BigBIRD: A large-scale 3D database of object instances , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[53]  Shahram Izadi,et al.  Modeling Kinect Sensor Noise for Improved 3D Reconstruction and Tracking , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[54]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[55]  Sander Oude Elberink,et al.  Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications , 2012, Sensors.

[56]  Sing Bing Kang,et al.  Registration and integration of textured 3-D data , 1997, Proceedings. International Conference on Recent Advances in 3-D Digital Imaging and Modeling (Cat. No.97TB100134).

[57]  Kurt Konolige,et al.  CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching , 2008, ECCV.

[58]  Hong Sun,et al.  A Local Feature Descriptor Based on Combination of Structure and Texture Information for Multispectral Image Matching , 2019, IEEE Geoscience and Remote Sensing Letters.

[59]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[60]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[61]  Vincent Lepetit,et al.  Learning Image Descriptors with Boosting , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Stefano Soatto,et al.  Features for recognition: viewpoint invariance for non-planar scenes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[63]  W. Kabsch A discussion of the solution for the best rotation to relate two sets of vectors , 1978 .

[64]  Torsten Sattler,et al.  Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Takio Kurita,et al.  Computation of Surface Curvature from Range Images Using Geometrically Intrinsic Weights , 1992, MVA.

[66]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[67]  Gustavo Carneiro,et al.  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Pietro Perona,et al.  Evaluation of Features Detectors and Descriptors based on 3D Objects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[69]  Wolfram Burgard,et al.  Highly accurate 3D surface models by sparse surface adjustment , 2012, 2012 IEEE International Conference on Robotics and Automation.

[70]  Giuseppe Valenzise,et al.  Local visual features extraction from texture+depth content based on depth image analysis , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[71]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.