Applying Deep Learning in Augmented Reality Tracking

An existing deep learning architecture has been adapted to solve the detection problem in camera-based tracking for augmented reality (AR). A known target, in this case a planar object, is rendered under various viewing conditions including varying orientation, scale, illumination and sensor noise. The resulting corpus is used to train a convolutional neural network to match given patches in an incoming image. The results show comparable or better performance compared to state of art methods. Timing performance of the detector needs improvement but when considered in conjunction with the robust pose estimation process promising results are shown.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Dieter Schmalstieg,et al.  Real-Time Detection and Tracking for Augmented Reality on Mobile Phones , 2010, IEEE Transactions on Visualization and Computer Graphics.

[3]  Ronald Azuma,et al.  A Survey of Augmented Reality , 1997, Presence: Teleoperators & Virtual Environments.

[4]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6]  Yakup Genc,et al.  Marker-less tracking for AR: a learning-based approach , 2002, Proceedings. International Symposium on Mixed and Augmented Reality.

[7]  Tom Drummond,et al.  Binary Histogrammed Intensity Patches for Efficient and Robust Matching , 2011, International Journal of Computer Vision.

[8]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Dieter Schmalstieg,et al.  Visual tracking for Augmented Reality , 2010, 2010 International Conference on Indoor Positioning and Indoor Navigation.

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[15]  Henry Been-Lirn Duh,et al.  Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[16]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[17]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[18]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[19]  Richard Szeliski,et al.  Multi-image matching using multi-scale oriented patches , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).