Temporally Coherent 3D Point Cloud Video Segmentation in Generic Scenes

Video segmentation is an important building block for high level applications, such as scene understanding and interaction analysis. While outstanding results are achieved in this field by the state-of-the-art learning and model-based methods, they are restricted to certain types of scenes or require a large amount of annotated training data to achieve object segmentation in generic scenes. On the other hand, RGBD data, widely available with the introduction of consumer depth sensors, provide actual world 3D geometry compared with 2D images. The explicit geometry in RGBD data greatly help in computer vision tasks, but the lack of annotations in this type of data may also hinder the extension of learning-based methods to RGBD. In this paper, we present a novel generic segmentation approach for 3D point cloud video (stream data) thoroughly exploiting the explicit geometry in RGBD. Our proposal is only based on low level features, such as connectivity and compactness. We exploit temporal coherence by representing the rough estimation of objects in a single frame with a hierarchical structure and propagating this hierarchy along time. The hierarchical structure provides an efficient way to establish temporal correspondences at different scales of object-connectivity and to temporally manage the splits and merges of objects. This allows updating the segmentation according to the evidence observed in the history. The proposed method is evaluated on several challenging data sets, with promising results for the presented approach.

[1]  Eren Erdal Aksoy,et al.  Point cloud video object segmentation using a persistent supervoxel world-model , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Thomas Brox,et al.  Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Florentin Wörgötter,et al.  Object Partitioning Using Local Convexity , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Irfan A. Essa,et al.  Efficient Hierarchical Graph-Based Segmentation of RGBD Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Stephen Lin,et al.  Object-based RGBD image co-segmentation with mutex constraint , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Florentin Wörgötter,et al.  Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[11]  Sven Behnke,et al.  Real-Time Plane Segmentation Using RGB-D Cameras , 2012, RoboCup.

[12]  Babette Dellen,et al.  Depth-supported real-time video segmentation with the Kinect , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[13]  Nico Blodow,et al.  Fast geometric point labeling using conditional random fields , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Yongchao Xu,et al.  Hierarchical Segmentation Using Tree-Based Shape Spaces , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  LinStephen,et al.  Object-Based Multiple Foreground Segmentation in RGBD Video , 2017 .

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Xiao Lin,et al.  3D point cloud segmentation oriented to the analysis of interactions , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[19]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[20]  Stephen Lin,et al.  Object-Based Multiple Foreground Segmentation in RGBD Video , 2017, IEEE Transactions on Image Processing.

[21]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Longin Jan Latecki,et al.  Maximum weight cliques with mutex constraints for video object segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[24]  Hedvig Kjellström,et al.  Audio-visual classification and detection of human manipulation actions , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Verónica Vilaplana,et al.  Binary Partition Trees for Object Detection , 2008, IEEE Transactions on Image Processing.

[26]  Atsushi Nakazawa,et al.  Motion Coherent Tracking Using Multi-label MRF Optimization , 2012, International Journal of Computer Vision.

[27]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Rabab Kreidieh Ward,et al.  Object-Based Multiple Foreground Video Co-Segmentation via Multi-State Selection Graph , 2015, IEEE Transactions on Image Processing.

[29]  Xiao Lin,et al.  3D Point Cloud Video Segmentation Based on Interaction Analysis , 2016, ECCV Workshops.

[30]  Javier Ruiz Hidalgo,et al.  Detecting end-effectors on 2.5D data using geometric deformable models: Application to human pose estimation , 2013, Comput. Vis. Image Underst..

[31]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[32]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[33]  Lei Gao,et al.  A review of algorithms for filtering the 3D point cloud , 2017, Signal Process. Image Commun..

[34]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Carme Torras,et al.  Consistent Depth Video Segmentation Using Adaptive Surface Models , 2015, IEEE Transactions on Cybernetics.

[36]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Philippe Jean Salembier Clairon,et al.  Hierarchical video representation with trajectory binary partition tree , 2013, CVPR 2013.

[38]  Dong-Soo Kwon,et al.  Incremental object learning and robust tracking of multiple objects from RGB-D point set data , 2014, J. Vis. Commun. Image Represent..

[39]  Jitendra Malik,et al.  Tracking as Repeated Figure/Ground Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..