Learning Pixel Trajectories with Multiscale Contrastive Random Walks

A range of video modeling tasks, from optical flow to multiple object tracking, share the same fundamental challenge: establishing space-time correspondence. Yet, approaches that dominate each space differ. We take a step towards bridging this gap by extending the recent contrastive random walk formulation to much denser, pixel-level spacetime graphs. The main contribution is introducing hierarchy into the search problem by computing the transition matrix between two frames in a coarse-to-fine manner, forming a multiscale contrastive random walk when extended in time. This establishes a unified technique for selfsupervised learning of optical flow, keypoint tracking, and video object segmentation. Experiments demonstrate that, for each of these tasks, the unified model achieves performance competitive with strong self-supervised approaches specific to that task.

[1]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[2]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Sergio Guadarrama,et al.  Tracking Emerges by Colorizing Videos , 2018, ECCV.

[4]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Jia-Bin Huang,et al.  DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency , 2018, ECCV.

[6]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[7]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[8]  Allan Jabri,et al.  Space-Time Correspondence as a Contrastive Random Walk , 2020, NeurIPS.

[9]  Michael R. Lyu,et al.  SelFlow: Self-Supervised Learning of Optical Flow , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Daniel P. Huttenlocher,et al.  Learning for Optical Flow Using Stochastic Optimization , 2008, ECCV.

[11]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[12]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[13]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[14]  Michael J. Black,et al.  Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Konstantinos G. Derpanis,et al.  Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness , 2016, ECCV Workshops.

[16]  Wei Liu,et al.  Unsupervised Deep Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Ning Xu,et al.  YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark , 2018, ArXiv.

[18]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Seth J. Teller,et al.  Particle Video: Long-Range Motion Estimation Using Point Trajectories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[21]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[22]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ali Farhadi,et al.  Watching the World Go By: Representation Learning from Unlabeled Videos , 2020, ArXiv.

[24]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[26]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Stefan Roth,et al.  MirrorFlow: Exploiting Symmetries in Joint Optical Flow and Occlusion Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Xueting Li,et al.  Joint-task Self-supervised Learning for Temporal Correspondence , 2019, NeurIPS.

[29]  Cheng Lei,et al.  Optical flow estimation on coarse-to-fine region-trees using discrete optimization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Henning Zimmer,et al.  Modeling temporal coherence for optical flow , 2011, 2011 International Conference on Computer Vision.

[31]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[32]  Václav Hlavác,et al.  Efficient MRF Deformation Model for Non-Rigid Image Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Yi Yang,et al.  Occlusion Aware Unsupervised Learning of Optical Flow , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Li Xu,et al.  A Segmentation Based Variational Model for Accurate Optical Flow Estimation , 2008, ECCV.

[35]  Guofeng Zhang,et al.  LIFE: Lighting Invariant Flow Estimation , 2021, ArXiv.

[36]  Varun Jampani,et al.  AutoFlow: Learning a Better Training Set for Optical Flow , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ce Liu,et al.  Towards Longer Long-Range Motion Trajectories , 2012, BMVC.

[38]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[39]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[40]  Hongdong Li,et al.  Unsupervised Deep Epipolar Flow for Stationary or Dynamic Scenes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Ying Tai,et al.  Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Allan Jabri,et al.  Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Kurt Keutzer,et al.  Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.

[44]  Michael J. Black,et al.  Layered image motion with explicit occlusions, temporal consistency, and depth ordering , 2010, NIPS.

[45]  Michael J. Black,et al.  Learning Optical Flow , 2008, ECCV.

[46]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[48]  Stefan Roth,et al.  UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[49]  Carsten Rother,et al.  Discrete-Continuous Optimization for Optical Flow Estimation , 2009, Statistical and Geometrical Approaches to Visual Motion Analysis.

[50]  Luca Bertinetto,et al.  Do Different Tracking Tasks Require Different Appearance Models? , 2021, NeurIPS.

[51]  Yair Weiss,et al.  Learning the Local Statistics of Optical Flow , 2013, NIPS.

[52]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[53]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Daniel Cremers,et al.  Learning by Association — A Versatile Semi-Supervised Training Method for Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[56]  Daniel Maurer,et al.  SMURF: Self-Teaching Multi-Frame Unsupervised RAFT with Full-Image Warping , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Joachim Weickert,et al.  Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods , 2005, International Journal of Computer Vision.

[58]  Camillo J. Taylor,et al.  Optical Flow with Geometric Occlusion Estimation and Fusion of Multiple Frames , 2014, EMMCVPR.

[59]  Michael R. Lyu,et al.  DDFlow: Learning Optical Flow with Unlabeled Data Distillation , 2019, AAAI.

[60]  Nicholay Topin,et al.  Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[61]  Yansong Tang,et al.  Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning , 2021, ArXiv.

[62]  Jonathan T. Barron,et al.  What Matters in Unsupervised Optical Flow , 2020, ECCV.

[63]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[64]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Xiaolong Wang,et al.  Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective , 2021, ArXiv.

[66]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[67]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[68]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[69]  Jian Sun,et al.  UPFlow: Upsampling Pyramid for Unsupervised Optical Flow Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Michael J. Black,et al.  Supplementary Material for Unsupervised Learning of Multi-Frame Optical Flow with Occlusions , 2018 .

[71]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[72]  Nebojsa Jojic,et al.  Consistent segmentation for optical flow estimation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[73]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[74]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[75]  Kuo-Chin Fan,et al.  Estimating Optical Flow by Integrating Multi-Frame Information , 2008, J. Inf. Sci. Eng..

[76]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  C. Bregler,et al.  Large displacement optical flow , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Jonathan T. Barron,et al.  Jump: virtual reality video , 2016, ACM Trans. Graph..

[79]  Erika Lu,et al.  MAST: A Memory-Augmented Self-Supervised Tracker , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).