An Online Vision System for Understanding Complex Assembly Tasks

We present an integrated system for the recognition, pose estimation and simultaneous tracking of multiple objects in 3D scenes. Our target application is a complete semantic representation of dynamic scenes which requires three essential steps; recognition of objects, tracking their movements, and identification of interactions between them. We address this challenge with a complete system which uses object recognition and pose estimation to initiate object models and trajectories, a dynamic sequential octree structure to allow for full 6DOF tracking through occlusions, and a graph-based semantic representation to distil interactions. We evaluate the proposed method on real scenarios by comparing tracked outputs to ground truth trajectories and we compare the results to Iterative Closest Point and Particle Filter based trackers.

[1]  Markus Vincze,et al.  A Global Hypotheses Verification Method for 3D Object Recognition , 2012, ECCV.

[2]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Henrik Gordon Petersen,et al.  Pose estimation using local structure-specific shape and appearance context , 2013, 2013 IEEE International Conference on Robotics and Automation.

[4]  Norbert Krüger,et al.  Multi-view object recognition using view-point invariant shape relations and appearance information , 2013, 2013 IEEE International Conference on Robotics and Automation.

[5]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[6]  Eren Erdal Aksoy,et al.  Point cloud video object segmentation using a persistent supervoxel world-model , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Gordon Cheng,et al.  Automatic segmentation and recognition of human activities from observation based on semantic reasoning , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Jimmy A. Jørgensen,et al.  Enabling grasping of unknown objects through a synergistic use of edge and surface information , 2012, Int. J. Robotics Res..

[9]  Eren Erdal Aksoy,et al.  Learning the semantics of object–action relations by observation , 2011, Int. J. Robotics Res..

[10]  Ales Ude,et al.  Analysis of human peg-in-hole executions in a robotic embodiment using uncertain grasps , 2013, 9th International Workshop on Robot Motion and Control.

[11]  Dieter Fox,et al.  Adapting the Sample Size in Particle Filters Through KLD-Sampling , 2003, Int. J. Robotics Res..

[12]  Ian D. Reid,et al.  STAR3D: Simultaneous Tracking and Reconstruction of 3D Objects Using RGB-D Data , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Yiannis Aloimonos,et al.  Detection of Manipulation Action Consequences (MAC) , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[15]  Carlos Martínez,et al.  Vision-guided robot alignment for scalable, flexible assembly automation , 2011, 2011 IEEE International Conference on Robotics and Biomimetics.

[16]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[17]  Henrik I. Christensen,et al.  RGB-D object tracking: A particle filter approach on GPU , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.