Recurrent Spatial Pyramid CNN for Optical Flow Estimation

Optical flow estimation plays an important role in many multimedia and computer vision tasks. Although great progress has been made in applying convolutional neural networks (CNNs) to estimate optical flow in recent works, it is still difficult for CNNs to generate optical flow with the desired effectiveness and efficiency. Compared to CNN-based methods, conventional variational methods normally perform to optimize an energy function and produce optical flow with more precise details. Inspired by the effectiveness of variational methods and deep CNNs, we propose a recurrent spatial pyramid (RecSPy) network for optical flow estimation. To deal with large displacements and to decrease the number of parameters, we formulate the spatial pyramid as a recurrent process, and adopt a CNN to refine optical flow at each spatial scale. Furthermore, to improve the results with more precise details, we propose an energy function that encodes structure and constancy constraints to help refine the optical flow at each spatial scale. The combination of the proposed RecSPy network and the proposed energy-based refinement enables our system to estimate optical flow effectively and efficiently. Experimental results on the benchmarks validate the effectiveness and efficiency of the proposed method.

[1]  Ying Wu,et al.  Large Displacement Optical Flow from Nearest Neighbor Fields , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Dimitrios Androutsos,et al.  Robust Semi-Automatic Depth Map Generation in Unconstrained Images and Video Sequences for 2D to Stereoscopic 3D Conversion , 2014, IEEE Transactions on Multimedia.

[3]  Nassir Navab,et al.  Optical flow estimation with uncertainties through dynamic MRFs , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[5]  Shai Avidan,et al.  Coherency Sensitive Hashing , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Luc Van Gool,et al.  Fast Optical Flow Using Dense Inverse Search , 2016, ECCV.

[7]  Hailin Jin,et al.  Fast Edge-Preserving PatchMatch for Large Displacement Optical Flow , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Lior Wolf,et al.  Optical Flow Requires Multiple Strategies (but Only One Network) , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Peter Schelkens,et al.  Spatio-Temporally Consistent Color and Structure Optimization for Multiview Video Color Correction , 2015, IEEE Transactions on Multimedia.

[10]  Tiejun Huang,et al.  Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN , 2016, IEEE Transactions on Multimedia.

[11]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Patrick Pérez,et al.  Dense estimation and object-based segmentation of the optical flow with robust techniques , 1998, IEEE Trans. Image Process..

[13]  Vladlen Koltun,et al.  Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Daniel Cremers,et al.  Structure- and motion-adaptive regularization for high accuracy optic flow , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Didier Stricker,et al.  Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jiwen Lu,et al.  Deep Video Hashing , 2017, IEEE Transactions on Multimedia.

[17]  Wenxiong Kang,et al.  Robust Fingertip Detection in a Complex Environment , 2016, IEEE Transactions on Multimedia.

[18]  Cordelia Schmid,et al.  DeepMatching: Hierarchical Deformable Dense Matching , 2015, International Journal of Computer Vision.

[19]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jian Sun,et al.  Guided Image Filtering , 2010, ECCV.

[21]  Yunsong Li,et al.  Efficient Coarse-to-Fine Patch Match for Large Displacement Optical Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Lior Wolf,et al.  InterpoNet, a Brain Inspired Neural Network for Optical Flow Dense Interpolation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Kuo-Chin Fan,et al.  Motion Flow-Based Video Retrieval , 2007, IEEE Transactions on Multimedia.

[24]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[25]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[26]  Min Bai,et al.  Exploiting Semantic Information and Deep Matching for Optical Flow , 2016, ECCV.

[27]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jiebo Luo,et al.  Guest Editorial: Deep Learning for Multimedia Computing , 2015, IEEE Trans. Multim..

[29]  Joachim Weickert,et al.  Illumination-Robust Variational Optical Flow with Photometric Invariants , 2007, DAGM-Symposium.

[30]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Lior Wolf,et al.  PatchBatch: A Batch Augmented Loss for Optical Flow , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jian Sun,et al.  Computing nearest-neighbor fields via Propagation-Assisted KD-Trees , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Alexander G. Hauptmann,et al.  Guided Optical Flow Learning , 2017, ArXiv.

[34]  Jiaolong Yang,et al.  Robust Optical Flow Estimation of Double-Layer Images under Transparency or Reflection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Michael J. Black,et al.  Learning Optical Flow , 2008, ECCV.

[37]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[38]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[39]  Gang Wang,et al.  Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Xiao-Ping Zhang,et al.  Illumination Robust Video Foreground Prediction Based on Color Recovering , 2014, IEEE Transactions on Multimedia.

[41]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Konstantinos G. Derpanis,et al.  Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness , 2016, ECCV Workshops.

[43]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Jiaolong Yang,et al.  Dense, accurate optical flow estimation with piecewise parametric model , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[46]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Hongliang Li,et al.  A Fast HEVC Inter CU Selection Method Based on Pyramid Motion Divergence , 2014, IEEE Transactions on Multimedia.

[48]  Michael J. Black,et al.  Robust dynamic motion estimation over time , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Xiao-Ping Zhang,et al.  A Hierarchical Spatio-Temporal Model for Human Activity Recognition , 2017, IEEE Transactions on Multimedia.

[50]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[52]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[53]  Bingbing Ni,et al.  Video Object Segmentation Via Dense Trajectories , 2015, IEEE Transactions on Multimedia.

[54]  Didier Stricker,et al.  Supplementary material of : CNN-based Patch Matching for Optical Flow with Thresholded Hinge Embedding Loss , 2017 .

[55]  Yasuyuki Matsushita,et al.  Motion detail preserving optical flow estimation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[56]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Louis A. Hageman,et al.  Iterative Solution of Large Linear Systems. , 1971 .

[58]  Thomas Brox,et al.  Universität Des Saarlandes Fachrichtung 6.1 – Mathematik Highly Accurate Optic Flow Computation with Theoretically Justified Warping Highly Accurate Optic Flow Computation with Theoretically Justified Warping , 2022 .