YOLOv3-Based Matching Approach for Roof Region Detection from Drone Images

Due to the large data volume, the UAV image stitching and matching suffers from high computational cost. The traditional feature extraction algorithms—such as Scale-Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), and Oriented FAST Rotated BRIEF (ORB)—require heavy computation to extract and describe features in high-resolution UAV images. To overcome this issue, You Only Look Once version 3 (YOLOv3) combined with the traditional feature point matching algorithms is utilized to extract descriptive features from the drone dataset of residential areas for roof detection. Unlike the traditional feature extraction algorithms, YOLOv3 performs the feature extraction solely on the proposed candidate regions instead of the entire image, thus the complexity of the image matching is reduced significantly. Then, all the extracted features are fed into Structural Similarity Index Measure (SSIM) to identify the corresponding roof region pair between consecutive image sequences. In addition, the candidate corresponding roof pair by our architecture serves as the coarse matching region pair and limits the search range of features matching to only the detected roof region. This further improves the feature matching consistency and reduces the chances of wrong feature matching. Analytical results show that the proposed method is 13× faster than the traditional image matching methods with comparable performance.

[1]  Seung-Jae Lee,et al.  Application of Recent Developments in Deep Learning to ANN-based Automatic Berthing Systems , 2020 .

[2]  Thomas Serre,et al.  Deep Learning: The Good, the Bad, and the Ugly. , 2019, Annual review of vision science.

[3]  Jana Müllerová,et al.  Assessing the Accuracy of Digital Surface Models Derived from Optical Imagery Acquired with Unmanned Aerial Systems , 2019, Drones.

[4]  Matti Pietikäinen,et al.  Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.

[5]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[6]  Andreas Kamilaris,et al.  Deep learning in agriculture: A survey , 2018, Comput. Electron. Agric..

[7]  Valeria-Ersilia Oniga,et al.  Determining the Optimum Number of Ground Control Points for Obtaining High Precision Results Based on UAS Images , 2018 .

[8]  Mohamed S. Shehata,et al.  Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images , 2017, ArXiv.

[9]  Björn Schuller,et al.  Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments , 2017 .

[10]  F. Agüera-Vega,et al.  Accuracy of Digital Surface Models and Orthophotos Derived from Unmanned Aerial Vehicle Photogrammetry , 2017 .

[11]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[12]  John W. Gross,et al.  A Statistical Examination of Image Stitching Software Packages For Use With Unmanned Aerial Systems , 2016 .

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[22]  Faraj Alhwarin,et al.  Fast and robust image feature matching methods for computer vision applications , 2011 .

[23]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[24]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[25]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[26]  Yoshitugu Hayashi,et al.  Automatic Generation of 3D Building Models with Multiple Roofs , 2008 .

[27]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[28]  G. LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[29]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[30]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[31]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Lisa M. Brown,et al.  A survey of image registration techniques , 1992, CSUR.

[35]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[36]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.