Real-time scalable recognition and tracking based on the server-client model for mobile Augmented Reality

Recent mobile device and vision technology advances have enabled mobile Augmented Reality (AR) to be serviced in real-time using natural features. However, in viewing augmented reality while moving about, the user is always encountering new and diverse target objects in different locations. Whether the AR system is scalable or not to the number of target objects is an important issue for future mobile AR services. But this scalability has been far limited due to the small capacity of internal storage and memory of the mobile devices. In this paper, a new framework is proposed that achieves scalability for mobile augmented reality. The scalability is achieved by using a bag of visual words based recognition module on the server side with connected through conventional Wi-Fi. On the client side, the mobile phone tracks and augments based on natural features in real-time. In the experiment, it takes 0.2 seconds for the cold start of an AR service initiated on a 10k object database with recognition accuracy 95%, which is acceptable for a real-world mobile AR application.

[1]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[3]  Dieter Schmalstieg,et al.  Real-Time Detection and Tracking for Augmented Reality on Mobile Phones , 2010, IEEE Transactions on Visualization and Computer Graphics.

[4]  Andrew Zisserman,et al.  Multiple View Geometry , 1999 .

[5]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[6]  Dieter Schmalstieg,et al.  Multiple target detection and tracking with guaranteed framerates on mobile phones , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[7]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[8]  Reinhard Koch,et al.  Architecture and Tracking Algorithms for a Distributed Mobile Industrial AR System , 2007 .

[9]  Hideo Saito,et al.  Virtually augmenting hundreds of real pictures: An approach based on learning, retrieval, and tracking , 2010, 2010 IEEE Virtual Reality Conference (VR).

[10]  Changchang Wu,et al.  SiftGPU : A GPU Implementation of Scale Invariant Feature Transform (SIFT) , 2007 .

[11]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).