论文信息 - Point cloud video object segmentation using a persistent supervoxel world-model

Point cloud video object segmentation using a persistent supervoxel world-model

Robust visual tracking is an essential precursor to understanding and replicating human actions in robotic systems. In order to accurately evaluate the semantic meaning of a sequence of video frames, or to replicate an action contained therein, one must be able to coherently track and segment all observed agents and objects. This work proposes a novel online point cloud based algorithm which simultaneously tracks 6DoF pose and determines spatial extent of all entities in indoor scenarios. This is accomplished using a persistent supervoxel world-model which is updated, rather than replaced, as new frames of data arrive. Maintenance of a world model enables general object permanence, permitting successful tracking through full occlusions. Object models are tracked using a bank of independent adaptive particle filters which use a supervoxel observation model to give rough estimates of object state. These are united using a novel multi-model RANSAC-like approach, which seeks to minimize a global energy function associating world-model supervoxels to predicted states. We present results on a standard robotic assembly benchmark for two application scenarios - human trajectory imitation and semantic action understanding - demonstrating the usefulness of the tracking in intelligent robotic systems.

[1] Minija Tamosiunaite,et al. Joining Movement Sequences: Modified Dynamic Movement Primitives for Robotics Applications Exemplified on Handwriting , 2012, IEEE Transactions on Robotics.

[2] Yazhe Tang,et al. Structured sparse representation appearance model for robust visual tracking , 2011, 2011 IEEE International Conference on Robotics and Automation.

[3] Pavel Krsek,et al. Robust Euclidean alignment of 3D point sets: the trimmed iterative closest point algorithm , 2005, Image Vis. Comput..

[4] B. Leibe,et al. Taking Mobile Multi-object Tracking to the Next Level: People, Unknown Objects, and Carried Items , 2012, ECCV.

[5] David Kim,et al. Shake'n'sense: reducing interference for overlapping structured light depth cameras , 2012, CHI.

[6] Radu Bogdan Rusu,et al. 3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[7] Ming-Hsuan Yang,et al. Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] K. Rathmill,et al. The Development of a European Benchmark for the Comparison of Assembly Robot Programming Systems , 1985 .

[9] Yuri Boykov,et al. Energy-Based Geometric Multi-model Fitting , 2012, International Journal of Computer Vision.

[10] K. Nakayama,et al. Occlusion and the solution to the aperture problem for motion , 1989, Vision Research.

[11] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[12] Tamim Asfour,et al. 6-DoF model-based tracking of arbitrarily shaped 3D objects , 2011, 2011 IEEE International Conference on Robotics and Automation.

[13] Stefan Schaal,et al. Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[14] D Marr,et al. Directional selectivity and its use in early visual processing , 1981, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[15] Aurélie Bugeau,et al. Tracking with Occlusions via Graph Cuts , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Eren Erdal Aksoy,et al. Learning the semantics of object–action relations by observation , 2011, Int. J. Robotics Res..

[17] Florentin Wörgötter,et al. Voxel Cloud Connectivity Segmentation - Supervoxels for Point Clouds , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Olga Veksler,et al. Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19] Dieter Fox,et al. Adapting the Sample Size in Particle Filters Through KLD-Sampling , 2003, Int. J. Robotics Res..

[20] Rüdiger Dillmann,et al. Markerless human motion tracking with a flexible model and appearance learning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[21] Longin Jan Latecki,et al. Maximum weight cliques with mutex constraints for video object segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[23] Eric L. Miller,et al. Multiple Hypothesis Video Segmentation from Superpixel Flows , 2010, ECCV.

[24] Takanori Yokoyama,et al. Robust automatic video object segmentation with graphcut assisted by SURF features , 2012, 2012 19th IEEE International Conference on Image Processing.

[25] Youfu Li,et al. Robust visual tracking with structured sparse representation appearance model , 2012, Pattern Recognit..

[26] Jun Nakanishi,et al. Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[27] Dieter Fox,et al. A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.