The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields

Both a good understanding of geometrical concepts and a broad familiarity with objects lead to our excellent perception of moving objects. The human ability to detect and segment moving objects works in the presence of multiple objects, complex background geometry, motion of the observer and even camouflage. How humans perceive moving objects so reliably is a longstanding research question in computer vision and borrows findings from related areas such as psychology, cognitive science and physics. One approach to the problem is to teach a deep network to model all of these effects. This contrasts with the strategy used by human vision, where cognitive processes and body design are tightly coupled and each is responsible for certain aspects of correctly identifying moving objects. Similarly from the computer vision perspective, there is evidence that classical, geometry-based techniques are better suited to the"motion-based"parts of the problem, while deep networks are more suitable for modeling appearance. In this work, we argue that the coupling of camera rotation and camera translation can create complex motion fields that are difficult for a deep network to untangle directly. We present a novel probabilistic model to estimate the camera's rotation given the motion field. We then rectify the flow field to obtain a rotation-compensated motion field for subsequent segmentation. This strategy of first estimating camera motion, and then allowing a network to learn the remaining parts of the problem, yields improved results on the widely used DAVIS benchmark as well as the recently published motion segmentation data set MoCA (Moving Camouflaged Animals).

[1]  Erika Lu,et al.  Self-supervised Video Object Segmentation by Motion Grouping , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Deva Ramanan,et al.  Learning to Segment Rigid Motions from Two Frames , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andrew Zisserman,et al.  Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation , 2020, ACCV.

[4]  Laura Leal-Taixé,et al.  Making a Case for 3D Convolutions for Object Segmentation in Videos , 2020, BMVC.

[5]  Jianbing Shen,et al.  MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation , 2020, IEEE Transactions on Image Processing.

[6]  Erika Lu,et al.  MAST: A Memory-Augmented Self-Supervised Tracker , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ling Shao,et al.  See More, Know More: Unsupervised Video Object Segmentation With Co-Attention Siamese Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Anelia Angelova,et al.  Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Deva Ramanan,et al.  Towards Segmenting Anything That Moves , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[10]  Stefano Soatto,et al.  Unsupervised Moving Object Detection via Contextual Information Separation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Katerina Fragkiadaki,et al.  Learning Spatial Common Sense With Geometry-Aware Recurrent Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Erik G. Learned-Miller,et al.  MoA-Net: Self-supervised Motion Segmentation , 2018, ECCV Workshops.

[13]  Erik G. Learned-Miller,et al.  The Best of Both Worlds: Combining CNNs and Geometric Constraints for Hierarchical Motion Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Michael J. Black,et al.  Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xun Xu,et al.  Motion Segmentation by Exploiting Complementary Geometric Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Ling Shao,et al.  Submodular Trajectories for Better Motion Segmentation in Videos , 2018, IEEE Transactions on Image Processing.

[17]  Ming-Hsuan Yang,et al.  SegFlow: Joint Learning for Video Object Segmentation and Optical Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Wolfram Burgard,et al.  SMSnet: Semantic motion segmentation using deep convolutional neural networks , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Chang-Su Kim,et al.  Primary Object Segmentation in Videos Based on Region Augmentation and Reduction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Karteek Alahari,et al.  Learning Video Object Segmentation with Visual Memory , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Margret Keuper,et al.  Higher-Order Minimum Cost Lifted Multicuts for Motion Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Kristen Grauman,et al.  FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Karteek Alahari,et al.  Learning Motion Patterns in Videos , 2016, CVPR.

[25]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Erik G. Learned-Miller,et al.  A Detailed Rubric for Motion Segmentation , 2016, ArXiv.

[27]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Erik G. Learned-Miller,et al.  It's Moving! A Probabilistic Model for Causal Motion Segmentation in Moving Camera Videos , 2016, ECCV.

[29]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thomas Brox,et al.  Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Brian Taylor,et al.  Causal video object segmentation from persistence of occlusions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Alper Yilmaz,et al.  Background subtraction for the moving camera: A geometric approach , 2014, Comput. Vis. Image Underst..

[33]  Vittorio Ferrari,et al.  Fast Object Segmentation in Unconstrained Video , 2013, 2013 IEEE International Conference on Computer Vision.

[34]  Allen R. Hanson,et al.  Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Michael J. Black,et al.  Lessons and Insights from Creating a Synthetic Optical Flow Benchmark , 2012, ECCV Workshops.

[36]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[37]  Katerina Fragkiadaki,et al.  Video segmentation by tracing discontinuities in a trajectory embedding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[39]  Ivan Laptev,et al.  Track to the future: Spatio-temporal video segmentation with long-range motion cues , 2011, CVPR 2011.

[40]  Jitendra Malik,et al.  Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[41]  Michael J. Black,et al.  Secrets of optical flow estimation and their principles , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Huijun Di,et al.  Background modeling from a free-moving camera by Multi-Layer Homography Algorithm , 2008, 2008 15th IEEE International Conference on Image Processing.

[43]  Wei Xiong,et al.  Moving Object Extraction with a Hand-held Camera , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[44]  Marc Pollefeys,et al.  A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate , 2006, ECCV.

[45]  Yiannis Aloimonos,et al.  Motion segmentation using occlusions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Mubarak Shah,et al.  Motion layer extraction in the presence of occlusion using graph cuts , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  René Vidal,et al.  A Unified Algebraic Approach to 2-D and 3-D Motion Segmentation , 2004, ECCV.

[48]  Takeo Kanade,et al.  A robust subspace approach to layer extraction , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[49]  P. Torr Geometric motion segmentation and model selection , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[50]  P. Anandan,et al.  A unified approach to moving object detection in 2D and 3D scenes , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[51]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[52]  H. C. Longuet-Higgins,et al.  The interpretation of a moving retinal image , 1980, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[53]  M. Pirenne THE SCIENTIFIC BASIS OF LEONARDO DA VINCI'S THEORY OF PERSPECTIVE* , 1952, The British Journal for the Philosophy of Science.

[54]  Andrew Zisserman,et al.  Segmenting Invisible Moving Objects , 2021, BMVC.

[55]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[56]  S. Shankar Sastry,et al.  Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix , 2002 .

[57]  Berthold K. P. Horn Projective Geometry Considered Harmful , 2001 .

[58]  G. L. Walls The evolutionary history of eye movements , 1962 .