CLKN: Cascaded Lucas-Kanade Networks for Image Alignment

This paper proposes a data-driven approach for image alignment. Our main contribution is a novel network architecture that combines the strengths of convolutional neural networks (CNNs) and the Lucas-Kanade algorithm. The main component of this architecture is a Lucas-Kanade layer that performs the inverse compositional algorithm on convolutional feature maps. To train our network, we develop a cascaded feature learning method that incorporates the coarse-to-fine strategy into the training process. This method learns a pyramid representation of convolutional features in a cascaded manner and yields a cascaded network that performs coarse-to-fine alignment on the feature pyramids. We apply our model to the task of homography estimation, and perform training and evaluation on a large labeled dataset generated from the MS-COCO dataset. Experimental results show that the proposed approach significantly outperforms the other methods.

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[3]  Selim Benhimane,et al.  Real-time image-based tracking of planes using efficient second-order minimization , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[4]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[5]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  J. Weickert,et al.  Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods , 2005 .

[7]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[8]  Matthew A. Brown,et al.  Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[9]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[10]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[11]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[12]  Michael S. Brown,et al.  As-Projective-As-Possible Image Stitching with Moving DLT , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[16]  Jian Sun,et al.  SteadyFlow: Spatially Smooth Optical Flow for Video Stabilization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrew G. Howard,et al.  Some Improvements on Deep Convolutional Neural Network Based Image Classification , 2013, ICLR.

[18]  Vincent Lepetit,et al.  Robust 3D Tracking with Descriptor Fields , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Stefanos Zafeiriou,et al.  Feature-Based Lucas–Kanade and Active Appearance Models , 2015, IEEE Transactions on Image Processing.

[20]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[21]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[24]  Éric Marchand,et al.  Pose Estimation for Augmented Reality: A Hands-On Survey , 2016, IEEE Transactions on Visualization and Computer Graphics.

[25]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[26]  Brett Browning,et al.  Robust Tracking in Low Light and Sudden Illumination Changes , 2016, 2016 Fourth International Conference on 3D Vision (3DV).