A Framework for Evaluating 6-DOF Object Trackers

We present a challenging and realistic novel dataset for evaluating 6-DOF object tracking algorithms. Existing datasets show serious limitations---notably, unrealistic synthetic data, or real data with large fiducial markers---preventing the community from obtaining an accurate picture of the state-of-the-art. Using a data acquisition pipeline based on a commercial motion capture system for acquiring accurate ground truth poses of real objects with respect to a Kinect V2 camera, we build a dataset which contains a total of 297 calibrated sequences. They are acquired in three different scenarios to evaluate the performance of trackers: stability, robustness to occlusion and accuracy during challenging interactions between a person and the object. We conduct an extensive study of a deep 6-DOF tracking architecture and determine a set of optimal parameters. We enhance the architecture and the training methodology to train a 6-DOF tracker that can robustly generalize to objects never seen during training, and demonstrate favorable performance compared to previous approaches trained specifically on the objects to track.

[1]  Li Li,et al.  The Accuracy and Precision of Position and Orientation Tracking in the HTC Vive Virtual Reality System for Scientific Research , 2017, i-Perception.

[2]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[3]  David Joseph Tan,et al.  Multi-forest Tracker: A Chameleon in Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Nassir Navab,et al.  Real-Time 3D Model Tracking in Color and Depth on a Single CPU Core , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Tae-Kyun Kim,et al.  Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Jean-François Lalonde,et al.  Deep 6-DOF Tracking , 2017, IEEE Transactions on Visualization and Computer Graphics.

[9]  Andreas Pichler,et al.  Tracking multiple rigid symmetric and non-symmetric objects in real-time using depth data , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[11]  Eric Brachmann,et al.  6-DOF Model Based Tracking via Object Coordinate Regression , 2014, ACCV.

[12]  Ulrich Schwanecke,et al.  Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Henk J. Sips,et al.  Adapting Particle Filter Algorithms to Many-Core Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[14]  Markus Vincze,et al.  Multimodal cue integration through Hypotheses Verification for RGB-D object recognition and 6DOF pose estimation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[15]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[16]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[17]  Henrik I. Christensen,et al.  RGB-D object tracking: A particle filter approach on GPU , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Roberto Cipolla,et al.  Geometric Loss Functions for Camera Pose Regression with Deep Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Nassir Navab,et al.  Looking Beyond the Simple Scenarios: Combining Learners and Optimizers in 3D Temporal Tracking , 2017, IEEE Transactions on Visualization and Computer Graphics.

[21]  Frank Chongwoo Park,et al.  Particle Filtering on the Euclidean Group , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[22]  Yohan Dupuis,et al.  A Study of Vicon System Positioning Performance , 2017, Sensors.

[23]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24]  Nassir Navab,et al.  A Versatile Learning-Based 3D Temporal Tracker: Scalable, Robust, Online , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).