Temporal Semantic Motion Segmentation Using Spatio Temporal Optimization

Segmenting moving objects in a video sequence has been a challenging problem and critical to outdoor robotic navigation. While recent literature has laid focus on regularizing object labels over a sequence of frames, exploiting the spatio-temporal features for motion segmentation has been scarce. Particularly in real world dynamic scenes, existing approaches fail to exploit temporal consistency in segmenting moving objects with large camera motion.

[1]  Julius Ziegler,et al.  StereoScan: Dense 3d reconstruction in real-time , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[2]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  K. Madhava Krishna,et al.  Moving object detection by multi-view geometric techniques from a single camera mounted robot , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  C. Lawrence Zitnick,et al.  Fast Edge Detection Using Structured Forests , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Kurt Keutzer,et al.  Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.

[8]  S. Shankar Sastry,et al.  Optimal segmentation of dynamic scenes from two perspective views , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[11]  Venu Madhav Govindu,et al.  Efficient Higher-Order Clustering on the Grassmann Manifold , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Wolfram Burgard,et al.  SMSnet: Semantic motion segmentation using deep convolutional neural networks , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[18]  K. Madhava Krishna,et al.  Semantic Motion Segmentation Using Dense CRF Formulation , 2014, ICVGIP.

[19]  K. Madhava Krishna,et al.  Dynamic body VSLAM with semantic constraints , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Jitendra Malik,et al.  Learning to segment moving objects in videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[22]  Sebastian Ramos,et al.  Vision-Based Offline-Online Perception Paradigm for Autonomous Driving , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[23]  K. Madhava Krishna,et al.  Using In-frame Shear Constraints for Monocular Motion Segmentation of Rigid Bodies , 2015, Journal of Intelligent & Robotic Systems.

[24]  Xiaogang Wang,et al.  Pedestrian Behavior Understanding and Prediction with Deep Neural Networks , 2016, ECCV.

[25]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Tao Chen,et al.  Object-Level Motion Detection From Moving Cameras , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Heiko Hirschmüller,et al.  Stereo Processing by Semiglobal Matching and Mutual Information , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[30]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[31]  Yang Yu,et al.  Multi-label hypothesis reuse , 2012, KDD.

[32]  Vladlen Koltun,et al.  Feature Space Optimization for Semantic Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Nazrul Haque,et al.  Joint Semantic and Motion Segmentation for Dynamic Scenes using Deep Convolutional Networks , 2017, VISIGRAPP.

[34]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Vasileios Zografos,et al.  Fast and accurate motion segmentation using Linear Combination of Views , 2011, BMVC.