Upgrading Optical Flow to 3D Scene Flow Through Optical Expansion

We describe an approach for upgrading 2D optical flow to 3D scene flow. Our key insight is that dense optical expansion – which can be reliably inferred from monocular frame pairs – reveals changes in depth of scene elements, e.g., things moving closer will get bigger. When integrated with camera intrinsics, optical expansion can be converted into a normalized 3D scene flow vectors that provide meaningful directions of 3D movement, but not their magnitude (due to an underlying scale ambiguity). Normalized scene flow can be further “upgraded” to the true 3D scene flow knowing depth in one frame. We show that dense optical expansion between two views can be learned from annotated optical flow maps or unlabeled video sequences, and applied to a variety of dynamic 3D perception tasks including optical scene flow, LiDAR scene flow, time-to-collision estimation and depth estimation, often demonstrating significant improvement over the prior art.

[1]  D Marr,et al.  A computational theory of human stereo vision. , 1979, Proceedings of the Royal Society of London. Series B, Biological sciences.

[2]  S. Ullman The interpretation of structure from motion , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[3]  H. C. Longuet-Higgins,et al.  The interpretation of a moving retinal image , 1980, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[4]  Drew H. Abney,et al.  Journal of Experimental Psychology : Human Perception and Performance Influence of Musical Groove on Postural Sway , 2015 .

[5]  W. Gogel,et al.  Perceived size and motion in depth from optical expansion , 1986, Perception & psychophysics.

[6]  W. Warren,et al.  Perception of translational heading from optical flow. , 1988, Journal of experimental psychology. Human perception and performance.

[7]  Martin Herman,et al.  Real-time single-workstation obstacle avoidance using only wide-field flow divergence , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[10]  Paul R. Schrater,et al.  Perceiving visual expansion without optic flow , 2001, Nature.

[11]  S. Shankar Sastry,et al.  Two-View Multibody Structure from Motion , 2005, International Journal of Computer Vision.

[12]  Wolfgang Förstner,et al.  Uncertainty and Projective Geometry , 2005 .

[13]  Christian Beder,et al.  Determining an Initial Image Pair for Fixing the Scale of a 3D Reconstruction from an Image Sequence , 2006, DAGM-Symposium.

[14]  Amaury Nègre,et al.  Real-Time Time-to-Collision from Variation of Intrinsic Scale , 2006, ISER.

[15]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Luc Van Gool,et al.  Depth and Appearance for Mobile Scene Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  B.K.P. Horn,et al.  Time to Contact Relative to a Planar Surface , 2007, 2007 IEEE Intelligent Vehicles Symposium.

[18]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Camillo J. Taylor,et al.  Expansion segmentation for visual collision detection and estimation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[20]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[21]  Aleix M. Martínez,et al.  Non-rigid structure from motion with complementary rank-3 spaces , 2011, CVPR 2011.

[22]  Naokazu Yokoya,et al.  Epipolar geometry estimation for wide-baseline omnidirectional street view images , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[23]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[24]  Li Xu,et al.  Scale Invariant Optical Flow , 2012, ECCV.

[25]  Sebastian Scherer,et al.  First results in detecting and avoiding frontal obstacles from a monocular camera for micro unmanned aerial vehicles , 2013, 2013 IEEE International Conference on Robotics and Automation.

[26]  Simon Lucey,et al.  Complex Non-rigid Motion 3D Reconstruction by Union of Subspaces , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  James J. Gibson,et al.  The Ecological Approach to Visual Perception: Classic Edition , 2014 .

[28]  Zhuowen Tu,et al.  Scale-Space SIFT flow , 2014, IEEE Winter Conference on Applications of Computer Vision.

[29]  Rui Yu,et al.  Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[30]  Torsten Sattler,et al.  Camera Pose Voting for Large-Scale Image-Based Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Konrad Schindler,et al.  3D Scene Flow Estimation with a Piecewise Rigid Scene Model , 2015, International Journal of Computer Vision.

[34]  Russ Tedrake,et al.  Integrated Perception and Control at High Speed: Evaluating Collision Avoidance Maneuvers Without Maps , 2016, WAFR.

[35]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Carolina Raposo,et al.  Theory and Practice of Structure-From-Motion Using Affine Correspondences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Wolfram Burgard,et al.  Rigid scene flow for 3D LiDAR scans , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Bernd Jähne,et al.  The HCI Benchmark Suite: Stereo and Flow Ground Truth with Uncertainties for Urban Autonomous Driving , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Vladlen Koltun,et al.  Dense Monocular Depth Estimation in Complex Dynamic Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Hongdong Li,et al.  Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Bingbing Ni,et al.  Unsupervised Deep Learning for Optical Flow Estimation , 2017, AAAI.

[44]  Ning Zhang,et al.  AutoScaler: Scale-Attention Networks for Visual Correspondence , 2016, BMVC.

[45]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[46]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Shahram Izadi,et al.  StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction , 2018, ECCV.

[48]  Naira Hovakimyan,et al.  Guaranteed Collision Avoidance Based on Line-of-Sight Angle and Time-to-Collision , 2018, 2018 Annual American Control Conference (ACC).

[49]  Daniel Barath,et al.  Recovering affine features from orientation- and scale-invariant ones , 2018, ACCV.

[50]  Sertac Karaman,et al.  The Blackbird Dataset: A large-scale dataset for UAV perception in aggressive flight , 2018, ISER.

[51]  Stefan Roth,et al.  UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[52]  Jan-Michael Frahm,et al.  Augmenting Crowd-Sourced 3D Reconstructions Using Semantic Detections , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Thomas Brox,et al.  Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation , 2018, ECCV.

[54]  Didier Stricker,et al.  Combining Stereo Disparity and Optical Flow for Basic Scene Flow , 2018, ArXiv.

[55]  Moritz Menze,et al.  Object scene flow , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[56]  Eshed Ohn-Bar,et al.  Future Near-Collision Prediction from Monocular Video: Feasibility, Dataset, and Challenges , 2019, ArXiv.

[57]  Deva Ramanan,et al.  Volumetric Correspondence Networks for Optical Flow , 2019, NeurIPS.

[58]  Jeannette Bohg,et al.  MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[60]  Yong Jae Lee,et al.  HPLFlowNet: Hierarchical Permutohedral Lattice FlowNet for Scene Flow Estimation on Large-Scale Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Zuzana Kukelova,et al.  Homography From Two Orientation- and Scale-Covariant Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[62]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Michael R. Lyu,et al.  SelFlow: Self-Supervised Learning of Optical Flow , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Rudolf Mester,et al.  Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Michael Happold,et al.  Hierarchical Deep Stereo Matching on High-Resolution Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Rui Hu,et al.  Deep Rigid Instance Scene Flow , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Leonidas J. Guibas,et al.  FlowNet3D: Learning Scene Flow in 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).