SimTrack: A simulation-based framework for scalable real-time object pose detection and tracking

We propose a novel approach for real-time object pose detection and tracking that is highly scalable in terms of the number of objects tracked and the number of cameras observing the scene. Key to this scalability is a high degree of parallelism in the algorithms employed. The method maintains a single 3D simulated model of the scene consisting of multiple objects together with a robot operating on them. This allows for rapid synthesis of appearance, depth, and occlusion information from each camera viewpoint. This information is used both for updating the pose estimates and for extracting the low-level visual cues. The visual cues obtained from each camera are efficiently fused back into the single consistent scene representation using a constrained optimization method. The centralized scene representation, together with the reliability measures it enables, simplify the interaction between pose tracking and pose detection across multiple cameras. We demonstrate the robustness of our approach in a realistic manipulation scenario. We publicly release this work as a part of a general ROS software framework for real-time pose estimation, SimTrack, that can be integrated easily for different robotic applications.

[1]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[2]  Markus Vincze,et al.  Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  H. C. Longuet-Higgins,et al.  The interpretation of a moving retinal image , 1980, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[4]  Olivier Kermorgant,et al.  Multi-sensor data fusion in sensor-based control: Application to multi-camera visual servoing , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  FuaPascal,et al.  Monocular model-based 3D tracking of rigid objects , 2005 .

[6]  Javier Díaz,et al.  Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[8]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[9]  Vladimir Ivan,et al.  Real-time object pose recognition and tracking with an imprecisely calibrated moving RGB-D camera , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  M. Vincze,et al.  BLORT-The Blocks World Robotic Vision Toolbox , 2010 .

[11]  Markus Vincze,et al.  Multimodal cue integration through Hypotheses Verification for RGB-D object recognition and 6DOF pose estimation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[12]  Hirokazu Kato,et al.  Marker tracking and HMD calibration for a video-based augmented reality conferencing system , 1999, Proceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR'99).

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Changchang Wu,et al.  SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT) , 2007 .

[15]  Niklas Bergström,et al.  Detecting, segmenting and tracking unknown objects using multi-label MRF inference , 2014, Comput. Vis. Image Underst..

[16]  Pieter Abbeel,et al.  Multimodal blending for high-accuracy instance recognition , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Danica Kragic,et al.  Survey on Visual Servoing for Manipulation , 2002 .

[18]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[19]  Henrik I. Christensen,et al.  Online multi-camera registration for bimanual workspace trajectories , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[20]  Manolis I. A. Lourakis,et al.  Evolutionary Quasi-Random Search for Hand Articulations Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[22]  Eduardo Ros,et al.  A Comparison of FPGA and GPU for Real-Time Phase-Based Optical Flow, Stereo, and Local Image Features , 2012, IEEE Transactions on Computers.

[23]  Kurt Konolige,et al.  Calibrating a Multi-arm Multi-sensor Robot: A Bundle Adjustment Approach , 2010, ISER.

[24]  Éric Marchand,et al.  ViSP for visual servoing: a generic software platform with a wide class of robot control skills , 2005, IEEE Robotics & Automation Magazine.

[25]  Siddhartha S. Srinivasa,et al.  The MOPED framework: Object recognition and pose estimation for manipulation , 2011, Int. J. Robotics Res..

[26]  John W. Tukey,et al.  Data Analysis and Regression: A Second Course in Statistics , 1977 .

[27]  Radu Horaud,et al.  Multiple-Camera Tracking of Rigid Objects , 2002, Int. J. Robotics Res..

[28]  Eduardo Ros,et al.  Real-Time Model-Based Articulated Object Pose Detection and Tracking with Variable Rigidity Constraints , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Vincent Lepetit,et al.  Monocular Model-Based 3D Tracking of Rigid Objects: A Survey , 2005, Found. Trends Comput. Graph. Vis..

[30]  Roberto Cipolla,et al.  Real-Time Visual Tracking of Complex Structures , 2002, IEEE Trans. Pattern Anal. Mach. Intell..