Dimensions of Motion: Learning to Predict a Subspace of Optical Flow from a Single Image

We introduce the problem of predicting, from a single video frame, a low-dimensional subspace of optical flow which includes the actual instantaneous optical flow. We show how several natural scene assumptions allow us to identify an appropriate flow subspace via a set of basis flow fields parameterized by disparity and a representation of object instances. The flow subspace, together with a novel loss function, can be used for the tasks of predicting monocular depth or predicting depth plus an object instance embedding. This provides a new approach to learning these tasks in an unsupervised fashion using monocular input video without requiring camera intrinsics or poses. Project page at https://dimensions-of-motion.github.io/.

[1]  Kristen Grauman,et al.  Im2Flow: Motion Hallucination from Static Images for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  David J. Fleet,et al.  Design and Use of Linear Models for Image Motion Analysis , 2000, International Journal of Computer Vision.

[3]  Zhengqi Li,et al.  MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  William T. Freeman,et al.  Learning the Depths of Moving People by Watching Frozen People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Peng Wang,et al.  Semantic Instance Segmentation via Deep Metric Learning , 2017, ArXiv.

[6]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Jia Deng,et al.  RAFT-3D: Scene Flow using Rigid-Motion Embeddings , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jiajun Wu,et al.  Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  K NayarShree,et al.  Visual learning and recognition of 3-D objects from appearance , 1995 .

[10]  Graham Fyffe,et al.  Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[11]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[12]  Noah Snavely,et al.  Single-View View Synthesis With Multiplane Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Anelia Angelova,et al.  Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Richard Szeliski,et al.  Animating Pictures with Eulerian Motion Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Changhu Wang,et al.  MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Friedrich Fraundorfer,et al.  Evaluation of CNN-based Single-Image Depth Estimation Methods , 2018, ECCV Workshops.

[17]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[18]  René Vidal,et al.  Projective Factorization of Multiple Rigid-Body Motions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  S. Ullman,et al.  Geometry and photometry in three-dimensional visual recognition , 1993 .

[21]  David J. Kriegman,et al.  What is the set of images of an object under all possible lighting conditions? , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Steven M. Seitz,et al.  The dimensionality of scene appearance , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Cordelia Schmid,et al.  SfM-Net: Learning of Structure and Motion from Video , 2017, ArXiv.

[24]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[25]  Konrad Schindler,et al.  Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Haoqiang Fan,et al.  Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jan-Michael Frahm,et al.  Differential Camera Tracking through Linearizing the Local Appearance Manifold , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  A. M. Hafiz,et al.  A survey on instance segmentation: state of the art , 2020, International Journal of Multimedia Information Retrieval.

[31]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[32]  Jong-Hwan Kim,et al.  Revisiting Self-Supervised Monocular Depth Estimation , 2021, RiTA.

[33]  Zhiguo Cao,et al.  Monocular Relative Depth Perception with Web Stereo Data Supervision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Martial Hebert,et al.  Dense Optical Flow Prediction from a Static Image , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[35]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Feng Liu,et al.  3D Ken Burns effect from a single image , 2019, ACM Trans. Graph..

[37]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[38]  Hang Zhao,et al.  Unsupervised Monocular Depth Learning in Dynamic Scenes , 2020, CoRL.

[39]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[40]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Michael J. Black,et al.  Efficient sparse-to-dense optical flow estimation using a learned basis and layers , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).