A Performance Evaluation of Local Features for Image-Based 3D Reconstruction

This paper performs a comprehensive and comparative evaluation of the state-of-the-art local features for the task of image-based 3D reconstruction. The evaluated local features cover the recently developed ones by using powerful machine learning techniques and the elaborately designed handcrafted features. To obtain a comprehensive evaluation, we choose to include both float type features and binary ones. Meanwhile, two kinds of datasets have been used in this evaluation. One is a dataset of many different scene types with groundtruth 3D points, containing images of different scenes captured at fixed positions, for quantitative performance evaluation of different local features in the controlled image capturing situation. The other dataset contains Internet scale image sets of several landmarks with a lot of unrelated images, which is used for qualitative performance evaluation of different local features in the free image collection situation. Our experimental results show that binary features are competent to reconstruct scenes from controlled image sequences with only a fraction of processing time compared to using float type features. However, for the case of a large scale image set with many distracting images, float type features show a clear advantage over binary ones. Currently, the most traditional SIFT is very stable with regard to scene types in this specific task and produces very competitive reconstruction results among all the evaluated local features. Meanwhile, although the learned binary features are not as competitive as the handcrafted ones, learning float type features with CNN is promising but still requires much effort in the future.

[1]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[3]  Lei Zhou,et al.  Very Large-Scale Global SfM by Distributed Motion Averaging , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[5]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Vincent Lepetit,et al.  BRIEF: Computing a Local Binary Descriptor Very Fast , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[8]  Pietro Perona,et al.  Evaluation of Features Detectors and Descriptors Based on 3D Objects , 2005, ICCV.

[9]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[10]  Matti Pietikäinen,et al.  RLBP: Robust Local Binary Pattern , 2013, BMVC.

[11]  Xin Yang,et al.  Local Difference Binary for Ultrafast and Distinctive Feature Description , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Dimitris N. Metaxas,et al.  Reconstruction-Based Disentanglement for Pose-Invariant Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Richard Szeliski,et al.  Towards Internet-scale multi-view stereo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Fuchao Wu,et al.  FRIF: Fast Robust Invariant Feature , 2013, BMVC.

[15]  Darius Burschka,et al.  Adaptive and Generic Corner Detection Based on the Accelerated Segment Test , 2010, ECCV.

[16]  Matti Pietikäinen,et al.  Evaluation of LBP and Deep Texture Descriptors with a New Robustness Benchmark , 2016, ECCV.

[17]  Noah Snavely,et al.  Robust Global Translations with 1DSfM , 2014, ECCV.

[18]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Tobias Höllerer,et al.  Evaluation of Interest Point Detectors and Feature Descriptors for Visual Tracking , 2011, International Journal of Computer Vision.

[21]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[22]  Pascal Fua,et al.  Receptive Fields Selection for Binary Feature Description , 2014, IEEE Transactions on Image Processing.

[23]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[24]  P. Fua,et al.  Do We Need Binary Features for 3 D Reconstruction ? , 2018 .

[25]  Torsten Sattler,et al.  Comparative Evaluation of Hand-Crafted and Learned Local Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jan-Michael Frahm,et al.  Comparative Evaluation of Binary Features , 2012, ECCV.

[27]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[28]  Jan-Michael Frahm,et al.  Reconstructing the world* in six days , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ethan Rublee,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[31]  Kurt Konolige,et al.  CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching , 2008, ECCV.

[32]  Matti Pietikäinen,et al.  Median Robust Extended Local Binary Pattern for Texture Classification , 2016, IEEE Transactions on Image Processing.

[33]  王振华,et al.  Exploring Local and Overall Ordinal Information for Robust Feature Description , 2016 .

[34]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[35]  Miroslaw Bober,et al.  Fast, Compact, and Discriminative: Evaluation of Binary Descriptors for Mobile Applications , 2017, IEEE Transactions on Multimedia.

[36]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.

[37]  Vincent Lepetit,et al.  Learning Image Descriptors with Boosting , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[39]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[40]  Henrik Aanæs,et al.  Large Scale Multi-view Stereopsis Evaluation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Zhanyi Hu,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Rotationally Invariant Descript , 2011 .

[42]  Gustavo Carneiro,et al.  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR 2011.

[44]  Jan-Michael Frahm,et al.  PAIGE: PAirwise Image Geometry Encoding for improved efficiency in Structure-from-Motion , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Adrien Bartoli,et al.  KAZE Features , 2012, ECCV.

[46]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[47]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Krystian Mikolajczyk,et al.  Evaluation of local detectors and descriptors for fast feature matching , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[49]  Vincent Lepetit,et al.  TILDE: A Temporally Invariant Learned DEtector , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[51]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Henrik Aanæs,et al.  Interesting Interest Points , 2011, International Journal of Computer Vision.

[53]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Andrew Zisserman,et al.  Learning Local Feature Descriptors Using Convex Optimisation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Shuhan Shen,et al.  Accurate Multiple View 3D Reconstruction Using Patch-Based Stereo for Large-Scale Scenes , 2013, IEEE Transactions on Image Processing.

[56]  Bin Fan,et al.  Local Image Descriptor: Modern Approaches , 2015, SpringerBriefs in Computer Science.

[57]  Rainer Stiefelhagen,et al.  Evaluation of local features for person re-identification in image sequences , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[58]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Johannes Gehrke,et al.  MatchMiner: Efficient Spanning Structure Mining in Large Image Collections , 2012, ECCV.

[60]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[62]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[63]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Shih-Fu Chang,et al.  Learning Discriminative and Transformation Covariant Local Feature Detectors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Marko Heikkilä,et al.  Description of interest regions with local binary patterns , 2009, Pattern Recognit..

[66]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.

[67]  Pascal Fua,et al.  Do We Need Binary Features for 3D Reconstruction? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[68]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[69]  Matti Pietikäinen,et al.  Automatic Dynamic Texture Segmentation Using Local Descriptors and Optical Flow , 2013, IEEE Transactions on Image Processing.

[70]  Matti Pietikäinen,et al.  Extended local binary patterns for face recognition , 2016, Inf. Sci..

[71]  Vincent Lepetit,et al.  Learning to Assign Orientations to Feature Points , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Matti Pietikäinen,et al.  Local binary features for texture classification: Taxonomy and experimental study , 2017, Pattern Recognit..

[74]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[75]  Krystian Mikolajczyk,et al.  Learning local feature descriptors with triplets and shallow convolutional neural networks , 2016, BMVC.

[76]  Bin Fan,et al.  Local Intensity Order Pattern for feature description , 2011, 2011 International Conference on Computer Vision.

[77]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).