Long-Term Image Boundary Prediction

Boundary estimation in images and videos has been a very active topic of research, and organizing visual information into boundaries and segments is believed to be a corner stone of visual perception. While prior work has focused on estimating boundaries for observed frames, our work aims at predicting boundaries of future unobserved frames. This requires our model to learn about the fate of boundaries and corresponding motion patterns -- including a notion of "intuitive physics". We experiment on natural video sequences along with synthetic sequences with deterministic physics-based and agent-based motions. While not being our primary goal, we also show that fusion of RGB and boundary prediction leads to improved RGB predictions.

[1]  Mario Fritz,et al.  Visual stability prediction for robotic manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Roland Memisevic,et al.  Modeling Deep Temporal Dependencies with Recurrent "Grammar Cells" , 2014, NIPS.

[4]  Marc'Aurelio Ranzato,et al.  Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.

[5]  Jitendra Malik,et al.  Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.

[6]  Rob Fergus,et al.  Restoring an Image Taken through a Window Covered with Dirt or Rain , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  C. K. Ogden A Source Book Of Gestalt Psychology , 2013 .

[8]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[9]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[10]  Dani Lischinski,et al.  Gradient Domain High Dynamic Range Compression , 2023 .

[11]  Rob Fergus,et al.  Learning Physical Intuition of Block Towers by Example , 2016, ICML.

[12]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  R. Baillargeon How Do Infants Learn About the Physical World? , 1994 .

[14]  Ruben Villegas,et al.  Learning to Generate Long-term Future via Hierarchical Prediction , 2017, ICML.

[15]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[16]  Yu-Bin Yang,et al.  Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections , 2016, ArXiv.

[17]  Sergey Levine,et al.  Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[18]  Mario Fritz,et al.  Visual Stability Prediction and Its Application to Manipulation , 2016, AAAI Spring Symposia.

[19]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[20]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[21]  M. Wertheimer Laws of organization in perceptual forms. , 1938 .

[22]  Bernt Schiele,et al.  Improved Image Boundaries for Better Video Segmentation , 2016, ECCV Workshops.

[23]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[24]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Joseph F. Murray,et al.  Supervised Learning of Image Restoration with Convolutional Networks , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[27]  Viorica Patraucean,et al.  Spatio-temporal video autoencoder with differentiable memory , 2015, ArXiv.

[28]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Luc Van Gool,et al.  Convolutional Oriented Boundaries , 2016, ECCV.

[30]  MalikJitendra,et al.  Segmentation of Moving Objects by Long Term Video Analysis , 2014 .

[31]  Eric Paul Krotkov,et al.  Active Computer Vision by Cooperative Focus and Stereo , 1989, Springer Series in Perception Engineering.

[32]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[33]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Ali Farhadi,et al.  Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  R. Baillargeon Infants' Physical World , 2004 .

[36]  Eric P. Xing,et al.  Dual Motion GAN for Future-Flow Embedded Video Prediction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Jiajun Wu,et al.  Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning , 2015, NIPS.

[38]  Thomas Brox,et al.  Spectral Graph Reduction for Efficient Image and Streaming Video Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.