Deep Multi-state Object Pose Estimation for Augmented Reality Assembly

Neural network machine learning approaches are widely used for object classification or detection problems with significant success. A similar problem with specific constraints and challenges is object state estimation, dealing with objects that consist of several removable or adjustable parts. A system that can detect the current state of such objects from camera images can be of great importance for Augmented Reality (AR) or robotic assembly and maintenance applications. In this work, we present a CNN that is able to detect and regress the pose of an object in multiple states. We then show how the output of this network can be used in an automatically generated AR scenario that provides step-by-step guidance to the user in assembling an object consisting of multiple components.

[1]  Andrew Y. C. Nee,et al.  A comprehensive survey of augmented reality assembly research , 2016, Advances in Manufacturing.

[2]  Larry S. Davis,et al.  An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[5]  Alain Pagani,et al.  Learning 6DoF Object Poses from Synthetic Single Channel Images , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct).

[6]  Didier Stricker,et al.  6DoF Object Tracking based on 3D Scans for Augmented Reality Remote Live Support , 2018, Comput..

[7]  Didier Stricker,et al.  [POSTER] Augmented Things: Enhancing AR Applications leveraging the Internet of Things and Universal 3D Object Tracking , 2017, 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct).

[8]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[9]  Didier Stricker,et al.  Real-time modeling and tracking manual workflows from first-person vision , 2013, 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[10]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[11]  Steven K. Feiner,et al.  Exploring the Benefits of Augmented Reality Documentation for Maintenance and Repair , 2011, IEEE Transactions on Visualization and Computer Graphics.

[12]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  James H. Oliver,et al.  Augmented Reality-Based Manual Assembly Support With Visual Features for Different Degrees of Difficulty , 2015, Int. J. Hum. Comput. Interact..

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Frank Biocca,et al.  Comparative effectiveness of augmented reality in object assembly , 2003, CHI '03.

[16]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[19]  Didier Stricker,et al.  Augmented reality for construction tasks: doorlock assembly , 1999 .

[20]  Werner Hartmann,et al.  Authoring of a mixed reality assembly instructor for hierarchical structures , 2003, The Second IEEE and ACM International Symposium on Mixed and Augmented Reality, 2003. Proceedings..

[21]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[22]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[23]  Zhaoxiang Zhang,et al.  Scale-Aware Trident Networks for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Ina Bornkessel-Schlesewsky,et al.  A Comparison of Predictive Spatial Augmented Reality Cues for Procedural Tasks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[25]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Russ Tedrake,et al.  Label Fusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).