Convolutional feature pyramid fusion via attention network

We present a novel fusion scheme between multiple intermediate convolutional features within convolutional neurual network (CNN) for dense correspondence estimation. In contrast to existing CNN-based descriptors that utilize a single convolutional activation, our approach jointly uses multiple intermediate features of CNN through the attention weight that balances the contribution of each features. We formulate the overall network as two sub-networks, correspondence network and attention network. The correspondence network is designed to provide multiple intermediate matching costs while the attention network is to learn the optimal weight between them. These two networks are learned in a joint manner to boost the correspondence estimation performance. Experiments demonstrate that our proposed method outperforms the state-of-the-art methods on various correspondence estimation tasks including depth estimation, optical flow, and semantic correspondence.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[3]  Cordelia Schmid,et al.  DeepMatching: Hierarchical Deformable Dense Matching , 2015, International Journal of Computer Vision.

[4]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Yung-Yu Chuang,et al.  Robust image alignment with multiple feature descriptors and matching-guided neighborhoods , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[8]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Stephen Lin,et al.  FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[13]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yoichi Sato,et al.  Joint Recovery of Dense Correspondence and Cosegmentation in Two Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Min Bai,et al.  Exploiting Semantic Information and Deep Matching for Optical Flow , 2016, ECCV.

[22]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[24]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[25]  Peter F. Sturm,et al.  A Factorization Based Algorithm for Multi-Image Projective Structure and Motion , 1996, ECCV.

[26]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.