论文信息 - Dimensions of Motion: Learning to Predict a Subspace of Optical Flow from a Single Image

Dimensions of Motion: Learning to Predict a Subspace of Optical Flow from a Single Image

We introduce the problem of predicting, from a single video frame, a low-dimensional subspace of optical ﬂow which includes the actual instantaneous optical ﬂow. We show how several natural scene assumptions allow us to identify an appropriate ﬂow subspace via a set of basis ﬂow ﬁelds parameterized by disparity and a representation of object instances. The ﬂow subspace, together with a novel loss function, can be used for the tasks of predicting monocular depth or predicting depth plus an object instance embedding. This provides a new approach to learning these tasks in an unsupervised fashion using monocular input video without requiring camera intrinsics or poses.

Richard Strong Bowen | R. Zabih | Noah Snavely | Richard Tucker

[1] Konrad Schindler,et al. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Haoqiang Fan,et al. Motion Basis Learning for Unsupervised Deep Homography Estimation with Subspace Projection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3] Changhu Wang,et al. MINE: Towards Continuous Depth MPI with NeRF for Novel View Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4] Jong-Hwan Kim,et al. Revisiting Self-Supervised Monocular Depth Estimation , 2021, RiTA.

[5] Jia Deng,et al. RAFT-3D: Scene Flow using Rigid-Motion Embeddings , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Richard Szeliski,et al. Animating Pictures with Eulerian Motion Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Hang Zhao,et al. Unsupervised Monocular Depth Learning in Dynamic Scenes , 2020, CoRL.

[8] A. M. Hafiz,et al. A survey on instance segmentation: state of the art , 2020, International Journal of Multimedia Information Retrieval.

[9] Noah Snavely,et al. Single-View View Synthesis With Multiplane Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Jia Deng,et al. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[11] Feng Liu,et al. 3D Ken Burns effect from a single image , 2019, ACM Trans. Graph..

[12] William T. Freeman,et al. Learning the Depths of Moving People by Watching Frozen People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Anelia Angelova,et al. Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Jiajun Wu,et al. Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Gabriel J. Brostow,et al. Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] Zhiguo Cao,et al. Monocular Relative Depth Perception with Web Stereo Data Supervision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Dacheng Tao,et al. Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Graham Fyffe,et al. Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[19] Friedrich Fraundorfer,et al. Evaluation of CNN-based Single-Image Depth Estimation Methods , 2018, ECCV Workshops.

[20] Zhengqi Li,et al. MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Kristen Grauman,et al. Im2Flow: Motion Hallucination from Static Images for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Cordelia Schmid,et al. SfM-Net: Learning of Structure and Motion from Video , 2017, ArXiv.

[25] Peng Wang,et al. Semantic Instance Segmentation via Deep Metric Learning , 2017, ArXiv.

[26] Zhiao Huang,et al. Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[27] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Gustavo Carneiro,et al. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[29] Thomas Brox,et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Michael J. Black,et al. Efficient sparse-to-dense optical flow estimation using a learned basis and layers , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Martial Hebert,et al. Dense Optical Flow Prediction from a Static Image , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[33] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[34] Steven M. Seitz,et al. The dimensionality of scene appearance , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35] René Vidal,et al. Projective Factorization of Multiple Rigid-Body Motions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Jan-Michael Frahm,et al. Differential Camera Tracking through Linearizing the Local Appearance Manifold , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Hiroshi Murase,et al. Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[38] David J. Fleet,et al. Design and Use of Linear Models for Image Motion Analysis , 2000, International Journal of Computer Vision.

[39] David J. Kriegman,et al. What is the set of images of an object under all possible lighting conditions? , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40] S. Ullman,et al. Geometry and photometry in three-dimensional visual recognition , 1993 .

[41] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .