PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

We present a compact but effective CNN model for optical flow, called PWC-Net. PWC-Net has been designed according to simple and well-established principles: pyramidal processing, warping, and the use of a cost volume. Cast in a learnable feature pyramid, PWC-Net uses the current optical flow estimate to warp the CNN features of the second image. It then uses the warped features and features of the first image to construct a cost volume, which is processed by a CNN to estimate the optical flow. PWC-Net is 17 times smaller in size and easier to train than the recent FlowNet2 model. Moreover, it outperforms all published optical flow methods on the MPI Sintel final pass and KITTI 2015 benchmarks, running at about 35 fps on Sintel resolution (1024 × 436) images. Our models are available on our project website.

[1]  Andreas Geiger,et al.  Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Stefan Roth,et al.  MirrorFlow: Exploiting Symmetries in Joint Optical Flow and Occlusion Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Michael J. Black,et al.  Optical Flow in Mostly Rigid Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jia Xu,et al.  Accurate Optical Flow via Direct Cost Volume Processing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art , 2017, Found. Trends Comput. Graph. Vis..

[6]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Lior Wolf,et al.  InterpoNet, a Brain Inspired Neural Network for Optical Flow Dense Interpolation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Konstantinos G. Derpanis,et al.  Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness , 2016, ECCV Workshops.

[12]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yunsong Li,et al.  Efficient Coarse-to-Fine Patch Match for Large Displacement Optical Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Vladlen Koltun,et al.  Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Min Bai,et al.  Exploiting Semantic Information and Deep Matching for Optical Flow , 2016, ECCV.

[16]  Hongdong Li,et al.  Learning Image Matching by Simply Watching Video , 2016, ECCV.

[17]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lior Wolf,et al.  PatchBatch: A Batch Augmented Loss for Optical Flow , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[20]  Sylvain Paris,et al.  Blind video temporal consistency , 2015, ACM Trans. Graph..

[21]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[22]  Didier Stricker,et al.  Flow Fields: Dense Correspondence Fields for Highly Accurate Large Displacement Optical Flow Estimation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Michael J. Black,et al.  Efficient sparse-to-dense optical flow estimation using a learned basis and layers , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Max Jaderberg,et al.  Spatial Transformer Networks , 2015, NIPS.

[26]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[27]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[31]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[34]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[36]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Yasuyuki Matsushita,et al.  Motion detail preserving optical flow estimation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Qi Gao,et al.  A generative perspective on MRFs in low-level vision , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Daniel Cremers,et al.  An Improved Algorithm for TV-L 1 Optical Flow , 2009, Statistical and Geometrical Approaches to Visual Motion Analysis.

[40]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[41]  Daniel P. Huttenlocher,et al.  Learning for Optical Flow Using Stochastic Optimization , 2008, ECCV.

[42]  Michael J. Black,et al.  Learning Optical Flow , 2008, ECCV.

[43]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[44]  Edward H. Adelson,et al.  Human-assisted motion annotation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[46]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Hui Cheng,et al.  Bilateral Filtering-Based Optical Flow Estimation with Occlusion Detection , 2006, ECCV.

[48]  Michael J. Black,et al.  On the Spatial Statistics of Optical Flow , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[49]  Joachim Weickert,et al.  Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods , 2005, International Journal of Computer Vision.

[50]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[51]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[52]  William T. Freeman,et al.  Learning low-level vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[53]  Jitendra Malik,et al.  Robust computation of optical flow in a multi-scale differential framework , 1993, 1993 (4th) International Conference on Computer Vision.

[54]  David J. Fleet,et al.  Performance of optical flow techniques , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55]  Edward H. Adelson,et al.  Probability distributions of optical flow , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[56]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[57]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[58]  Andreas Geiger,et al.  Computer Vision for Autonomous Vehicles: Problems, Datasets and State of the Art , 2020 .

[59]  Ming-Hsuan Yang,et al.  Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks , 2017, NIPS.

[60]  Didier Stricker,et al.  Supplementary material of : CNN-based Patch Matching for Optical Flow with Thresholded Hinge Embedding Loss , 2017 .

[61]  Michael J. Black,et al.  A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles Behind Them , 2013, International Journal of Computer Vision.

[62]  Daniel Cremers,et al.  Anisotropic Huber-L1 Optical Flow , 2009, BMVC.

[63]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[64]  I. Miyazaki,et al.  AND T , 2022 .

[65]  A. Krizhevsky ImageNet Classification with Deep Convolutional Neural Networks , 2022 .