Server-side object recognition and client-side object tracking for mobile augmented reality

In this paper we present a system for mobile augmented reality (AR) based on visual recognition. We split the tasks of recognizing an object and tracking it on the user's screen into a server-side and a client-side task, respectively. The capabilities of this hybrid client-server approach are demonstrated with a prototype application on the Android platform, which is able to augment both stationary (landmarks) and non stationary (media covers) objects. The database on the server side consists of hundreds of thousands of landmarks, which is crawled using a state of the art mining method for community photo collections. In addition to the landmark images, we also integrate a database of media covers with millions of items. Retrieval from these databases is done using vocabularies of local visual features. In order to fulfill the real-time constraints for AR applications, we introduce a method to speed-up geometric verification of feature matches. The client-side tracking of recognized objects builds on a multi-modal combination of visual features and sensor measurements. Here, we also introduce a motion estimation method, which is more efficient and precise than similar approaches. To the best of our knowledge this is the first system, which demonstrates a complete pipeline for augmented reality on mobile devices with visual object recognition scaled to millions of objects combined with real-time object tracking.

[1]  Luc Van Gool,et al.  Markerless augmented reality with a real-time affine region tracker , 2001, Proceedings IEEE and ACM International Symposium on Augmented Reality.

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Dieter Schmalstieg,et al.  First steps towards handheld augmented reality , 2003, Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings..

[4]  David G. Lowe,et al.  Scene modelling, recognition and tracking with invariant image features , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Michael Rohs,et al.  USING CAMERA-EQUIPPED MOBILE PHONES FOR INTERACTING WITH REAL-WORLD OBJECTS , 2004 .

[7]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[9]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Natasha Gelfand,et al.  Viewfinder Alignment , 2008, Comput. Graph. Forum.

[11]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[12]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[13]  Dieter Schmalstieg,et al.  Pose tracking from natural features on mobile phones , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[14]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[15]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  David W. Murray,et al.  Improving the Agility of Keyframe-Based SLAM , 2008, ECCV.

[17]  Dieter Schmalstieg,et al.  Robust and unobtrusive marker tracking on mobile phones , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[18]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[19]  Natasha Gelfand,et al.  SURFTrac: Efficient tracking and continuous object recognition using local feature descriptors , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[21]  David W. Murray,et al.  Parallel Tracking and Mapping on a camera phone , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[22]  Luc Van Gool,et al.  I know what you did last summer: object-level auto-annotation of holiday snaps , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.