Disentangling Architecture and Training for Optical Flow

How important are training details and datasets to recent optical flow architectures like RAFT? And do they generalize? To explore these questions, rather than develop a new architecture, we revisit three prominent architectures, PWC-Net, IRR-PWC and RAFT, with a common set of modern training techniques and datasets, and observe significant performance gains, demonstrating the importance and generality of these training details. Our newly trained PWC-Net and IRR-PWC show surprisingly large improvements, up to 30% versus original published results on Sintel and KITTI 2015 benchmarks. Our newly trained RAFT obtains an Fl-all score of 4.31% on KITTI 2015 and an avg. rank of 1.7 for end-point error on Middlebury. Our results demonstrate the benefits of separating the contributions of architectures, training techniques and datasets when analyzing performance gains of optical flow methods. Our source code is available at https://autoflow-google.github.io.

[1]  Haoqiang Fan,et al.  Learning Optical Flow with Adaptive Graph Reasoning , 2022, AAAI.

[2]  Yifan Zhou,et al.  CSFlow: Learning Optical Flow via Cross Strip Correlation for Autonomous Driving , 2022, 2022 IEEE Intelligent Vehicles Symposium (IV).

[3]  Trevor Darrell,et al.  A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Philip H. S. Torr,et al.  Separable Flow: Learning Motion Cost Volumes for Optical Flow Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Ross Wightman,et al.  ResNet strikes back: An improved training procedure in timm , 2021, ArXiv.

[6]  Zachary Teed,et al.  RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching , 2021, 2021 International Conference on 3D Vision (3DV).

[7]  Jakob Uszkoreit,et al.  How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers , 2021, Trans. Mach. Learn. Res..

[8]  Daniel Maurer,et al.  SMURF: Self-Teaching Multi-Frame Unsupervised RAFT with Full-Image Warping , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Varun Jampani,et al.  AutoFlow: Learning a Better Training Set for Optical Flow , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jianfei Cai,et al.  High-Resolution Optical Flow from 1D Attention and Correlation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Shihao Jiang,et al.  Learning to Estimate Hidden Motions with Global Motion Aggregation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Ekin D. Cubuk,et al.  Revisiting ResNets: Improved Training and Scaling Strategies , 2021, NeurIPS.

[13]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[14]  Jia Deng,et al.  RAFT-3D: Scene Flow using Rigid-Motion Embeddings , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[16]  Andrés Bruhn,et al.  An Anisotropic Selection Scheme for Variational Optical Flow Methods with Order-Adaptive Regularisation , 2021, SSVM.

[17]  Hongdong Li,et al.  Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation , 2020, NeurIPS.

[18]  Yuchao Dai,et al.  PRAFlow_RVC: Pyramid Recurrent All-Pairs Field Transforms for Optical Flow Estimation in Robust Vision Challenge 2020 , 2020, ArXiv.

[19]  Deqing Sun,et al.  Learnable Cost Volume Using the Cayley Representation , 2020, ECCV.

[20]  Lei Zhang,et al.  Suppress and Balance: A Simple Gated Network for Salient Object Detection , 2020, ECCV.

[21]  Jonathan T. Barron,et al.  What Matters in Unsupervised Optical Flow , 2020, ECCV.

[22]  Feiyue Huang,et al.  Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[24]  Yan Xu,et al.  MaskFlownet: Asymmetric Feature Matching With Learnable Occlusion Mask , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiao Chen,et al.  FOAL: Fast Online Adaptive Learning for Cardiac Motion Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Chen Sun,et al.  D3D: Distilled 3D Networks for Video Action Recognition , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Jan Kautz,et al.  Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Abdelaziz Djelouah,et al.  Neural Inter-Frame Compression for Video Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  In So Kweon,et al.  Deep Video Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Michael R. Lyu,et al.  SelFlow: Self-Supervised Learning of Optical Flow , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Chuang Gan,et al.  The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Stefan Roth,et al.  Iterative Residual Refinement for Joint Optical Flow and Occlusion Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Trevor Darrell,et al.  Hierarchical Discrete Distribution Decomposition for Match Density Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Michael J. Black,et al.  Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Deva Ramanan,et al.  Volumetric Correspondence Networks for Optical Flow , 2019, NeurIPS.

[38]  Zhidong Deng,et al.  SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[39]  Xiaoou Tang,et al.  LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  James M. Rehg,et al.  Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation , 2018, ECCV.

[41]  Chuang Gan,et al.  End-to-End Learning of Motion Representation for Video Understanding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Stefan Roth,et al.  UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[44]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Konstantinos G. Derpanis,et al.  Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness , 2016, ECCV Workshops.

[49]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Bernd Jähne,et al.  The HCI Benchmark Suite: Stereo and Flow Ground Truth with Uncertainties for Urban Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[54]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  Ying Wu,et al.  Large Displacement Optical Flow from Nearest Neighbor Fields , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[57]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[59]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[62]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[63]  David J. Fleet,et al.  Performance of optical flow techniques , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.