Self-Supervised Keypoint Detection Based on Multi-Layer Random Forest Regressor

This paper proposes a keypoint regressor (KeyReg), which consists of multi-layer random forest (MRF) regressor and single random forest (SRF) classifier modules. To increase the keypoints’ repeatability, the MRF regressor is applied to multi-scale images in a shared rules manner, and keypoints predicted at each scale are given a confidence score through the SRF for reliability measurement. Each candidate point is detected as the final keypoint through a non-maxima suppression process based on a confidence score. The MRF structure of KeyReg is designed to maintain a coarse-to-fine structure by varying the number of nodes per layer. In addition, the accuracy of the matching can be improved by removing less confidential keypoints through the continuous SRF classifier. KeyReg is the first approach to apply an MRF to the keypoint regression and is designed to run on a CPU rather than a GPU compared to DNN-based approaches. KeyReg training was conducted using COCO, and positive and negative examples were automatically obtained under a self-supervised learning method between the original image and a warped image. The proposed KeyReg showed superior performance in terms of repeatability, the accuracy of the homography, mean matching accuracy (MMA), and localization errors on HPatches dataset compared to state-of-the-art methods.

[1]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[2]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[3]  Torsten Sattler,et al.  Quad-Networks: Unsupervised Learning to Rank for Interest Point Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ji Feng,et al.  Deep forest , 2017, IJCAI.

[5]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[6]  Andrea Vedaldi,et al.  HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[8]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[9]  Gabriela Csurka,et al.  From handcrafted to deep local features , 2018, 1807.10254.

[10]  Mingyang Li,et al.  SEKD: Self-Evolving Keypoint Detection and Description , 2020, ArXiv.

[11]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[12]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[14]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[15]  Henrik Karstoft,et al.  UnsuperPoint: End-to-end Unsupervised Interest Point Detector and Descriptor , 2019, ArXiv.

[16]  José Miguel Buenaposada,et al.  A Deeply-Initialized Coarse-to-fine Ensemble of Regression Trees for Face Alignment , 2018, ECCV.

[17]  Yuan Yao,et al.  MONET: Multiview Semi-Supervised Keypoint Detection via Epipolar Divergence , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Konrad Schindler,et al.  Predicting Matchability , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Gabriela Csurka,et al.  R2D2: Repeatable and Reliable Detector and Descriptor , 2019, ArXiv.

[21]  Rares Ambrus,et al.  Neural Outlier Rejection for Self-Supervised Keypoint Learning , 2019, ICLR.

[22]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[23]  Yang Yang,et al.  Multi-Temporal Remote Sensing Image Registration Using Deep Convolutional Features , 2018, IEEE Access.

[24]  Andrea Vedaldi,et al.  Learning Covariant Feature Detectors , 2016, ECCV Workshops.

[25]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Kevin Miller,et al.  Forward Thinking: Building Deep Random Forests , 2017, ArXiv.

[28]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Krystian Mikolajczyk,et al.  Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Sooyeong Kwak,et al.  Driver Facial Landmark Detection in Real Driving Situations , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Byoung Chul Ko,et al.  Lightweight Multilayer Random Forests for Monitoring Driver Emotional Status , 2020, IEEE Access.

[32]  Sandro De Zanet,et al.  GLAMpoints: Greedily Learned Accurate Match Points , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Wee Sun Lee,et al.  Deep Graphical Feature Learning for the Feature Matching Problem , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Ronny Hänsch,et al.  Match or No Match: Keypoint Filtering based on Matching Probability , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).