A Two-Stream Symmetric Network with Bidirectional Ensemble for Aerial Image Matching

In this paper, we propose a novel method to precisely match two aerial images that were obtained in different environments via a two-stream deep network. By internally augmenting the target image, the network considers the two-stream with the three input images and reflects the additional augmented pair in the training. As a result, the training process of the deep network is regularized and the network becomes robust for the variance of aerial images. Furthermore, we introduce an ensemble method that is based on the bidirectional network, which is motivated by the isomorphic nature of the geometric transformation. We obtain two global transformation parameters without any additional network or parameters, which alleviate asymmetric matching results and enable significant improvement in performance by fusing two outcomes. For the experiment, we adopt aerial images from Google Earth and the International Society for Photogrammetry and Remote Sensing (ISPRS). To quantitatively assess our result, we apply the probability of correct keypoints (PCK) metric, which measures the degree of matching. The qualitative and quantitative results show the sizable gap of performance compared to the conventional methods for matching the aerial images. All code and our trained model, as well as the dataset are available online.

[1]  Y. Park,et al.  A synthesis procedure for associative memories based on space-varying cellular neural networks , 2001, Neural Networks.

[2]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Seong-Whan Lee,et al.  Reconstruction of 3D human body pose from stereo image sequences based on top-down learning , 2007, Pattern Recognit..

[4]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Anil K. Jain,et al.  Face Tracking and Recognition at a Distance: A Coaxial and Concentric PTZ Camera System , 2013, IEEE Transactions on Information Forensics and Security.

[6]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[7]  Yehezkel Lamdan,et al.  Object recognition by affine invariant matching , 2011, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Seong-Whan Lee,et al.  Tracking non-rigid objects using probabilistic Hausdorff distance matching , 2005, Pattern Recognit..

[9]  Heung-Il Suk,et al.  Hand gesture recognition based on dynamic Bayesian network framework , 2010, Pattern Recognit..

[10]  Lisa M. Brown,et al.  A survey of image registration techniques , 1992, CSUR.

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Bohyung Han,et al.  Attentive Semantic Alignment with Offset-Aware Correlation Kernels , 2018, ECCV.

[13]  SchieleBernt,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008 .

[14]  Steven M. Seitz,et al.  Multicore bundle adjustment , 2011, CVPR 2011.

[15]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[18]  Jean Ponce,et al.  SCNet: Learning Semantic Correspondence , 2017, ICCV.

[19]  Pascal Fua,et al.  Learning to Match Aerial Images with Deep Attentive Architectures , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Thomas Vetter,et al.  Face reconstruction from a small number of feature points , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Stephen Lin,et al.  DCTM: Discrete-Continuous Transformation Matching for Semantic Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[25]  Seong-Whan Lee,et al.  Biologically Motivated Computer Vision , 2002, Lecture Notes in Computer Science.

[26]  Seong-Whan Lee,et al.  LVQ combined with simulated annealing for optimal design of large-set reference models , 1996, Neural Networks.

[27]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Seong-Whan Lee,et al.  Facial component extraction and face recognition with support vector machines , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[30]  Fabio Remondino,et al.  ISPRS benchmark for multi - platform photogrammetry , 2015 .

[31]  Jan Flusser,et al.  Image registration methods: a survey , 2003, Image Vis. Comput..

[32]  Seong-Whan Lee,et al.  Authenticating corrupted face image based on noise model , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[33]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Josef Sivic,et al.  End-to-End Weakly-Supervised Semantic Alignment , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Seong-Whan Lee,et al.  Multiple people tracking using an appearance model based on temporal color , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[37]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[39]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[40]  Serge J. Belongie,et al.  Learning to Detect and Match Keypoints with Deep Architectures , 2016, BMVC.

[41]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[42]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[44]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[45]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Anil K. Jain,et al.  A Network of Dynamic Probabilistic Models for Human Interaction Analysis , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[47]  Anil K. Jain,et al.  Nighttime face recognition at large standoff: Cross-distance and cross-spectral matching , 2014, Pattern Recognit..

[48]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Dong-Gyu Lee,et al.  Discriminative context learning with gated recurrent unit for group activity recognition , 2018, Pattern Recognit..

[50]  Shengcai Liao,et al.  Nighttime Face Recognition at Long Distance: Cross-Distance and Cross-Spectral Matching , 2012, ACCV.

[51]  Larry S. Davis,et al.  SNIPER: Efficient Multi-Scale Training , 2018, NeurIPS.

[52]  Esa Rahtu,et al.  Siamese network features for image matching , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[53]  Seong-Whan Lee,et al.  Oblique aerial image matching based on iterative simulation and homography evaluation , 2019, Pattern Recognit..

[54]  Anil K. Jain,et al.  NFRAD: Near-Infrared Face Recognition at a Distance , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[55]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[56]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[57]  Seungryong Kim,et al.  FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[59]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Seong-Whan Lee,et al.  Qualitative estimation of camera motion parameters from the linear composition of optical flow , 2004, Pattern Recognit..

[61]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[62]  Jihun Park,et al.  Accurate object contour tracking based on boundary edge selection , 2007, Pattern Recognit..

[63]  Seong-Whan Lee,et al.  View-independent human action recognition with Volume Motion Template on single stereo camera , 2010, Pattern Recognit. Lett..

[64]  Vincent Lepetit,et al.  LIFT: Learned Invariant Feature Transform , 2016, ECCV.