Improving mobile MR applications using a cloud-based image segmentation approach with synthetic training data

In this paper, we show how the quality of augmentation in mobile Mixed Reality applications can be improved using a cloud-based image segmentation approach with synthetic training data. Many modern Augmented Reality frameworks are based on visual inertial odometry on mobile devices and therefore have limited access to tracking hardware (e.g., depth sensor). Consequently, tracking still suffers from drift that makes it difficult to utilize in use cases that require a higher precision. To improve tracking quality, we propose a cloud tracking approach that uses machine learning based image segmentation to recognize known objects in a real scene, which allows us to estimate a precise camera pose. Augmented Reality applications that utilize our web service can use the resulting camera pose to correct drift from time to time, while still using local tracking between key frames. Moreover, the device's position in the real world, when starting the application, is usually used as reference coordinate system. Therefore, we simplify the authoring of MR applications significantly due to a well-defined coordinate system, which is context-based and not dependend on the starting position of a user. We present all steps from web-based initialization over the generation of synthetic training data up to usage in production. In addition, we describe the underlying algorithms in detail. Finally, we show a mobile Mixed Reality application, which is based on this novel approach and discuss its advantages.

[1]  Éric Marchand,et al.  Pose Estimation for Augmented Reality: A Hands-On Survey , 2016, IEEE Transactions on Visualization and Computer Graphics.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Daniel Kurz,et al.  [Poster] A Mobile Augmented reality system to assist auto mechanics , 2014, 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[4]  James R. Bergen,et al.  Visual odometry , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Roberto Cipolla,et al.  Convolutional networks for real-time 6-DOF camera relocalization , 2015, ArXiv.

[6]  Xiaolin Hu,et al.  Delving deeper into convolutional neural networks for camera relocalization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Ziyan Wu,et al.  DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition , 2017, 2017 International Conference on 3D Vision (3DV).

[8]  Kate Saenko,et al.  From Virtual to Reality: Fast Adaptation of Virtual Object Detectors to Real Domains , 2014, BMVC.

[9]  Giovanni De Magistris,et al.  Transfer Learning from Synthetic to Real Images Using Variational Autoencoders for Precise Position Detection , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[10]  Barbara Caputo,et al.  Looking beyond appearances: Synthetic training data for deep CNNs in re-identification , 2017, Comput. Vis. Image Underst..

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Paul Grimm,et al.  NetFlinCS: A hybrid cloud-based framework to allow context-based detection and surveillance , 2017, 2017 23rd International Conference on Virtual System & Multimedia (VSMM).

[13]  Giovanni De Magistris,et al.  Transfer learning from synthetic to real images using variational autoencoders for robotic applications , 2017, ArXiv.

[14]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Timo Engelke,et al.  An augmented reality training platform for assembly and maintenance skills , 2013, Robotics Auton. Syst..

[17]  Paul Grimm,et al.  SMULGRAS: a platform for smart multicodal graphics search , 2017, Web3D.

[18]  Roberto Cipolla,et al.  Modelling uncertainty in deep learning for camera relocalization , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Robert M. Haralick,et al.  Review and analysis of solutions of the three point perspective pose estimation problem , 1994, International Journal of Computer Vision.