Virtually augmenting hundreds of real pictures: An approach based on learning, retrieval, and tracking

Tracking is a major issue of virtual and augmented reality applications. Single object tracking on monocular video streams is fairly well understood. However, when it comes to multiple objects, existing methods lack scalability and can recognize only a limited number of objects. Thanks to recent progress in feature matching, state-of-the-art image retrieval techniques can deal with millions of images. However, these methods do not focus on real-time video processing and can not track retrieved objects. In this paper, we present a method that combines the speed and accuracy of tracking with the scalability of image retrieval. At the heart of our approach is a bi-layer clustering process that allows our system to index and retrieve objects based on tracks of features, thereby effectively summarizing the information available on multiple video frames. As a result, our system is able to track in real-time multiple objects, recognized with low delay from a database of more than 300 entries.

[1]  Vincent Lepetit,et al.  Multiple 3D Object tracking for augmented reality , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[2]  Camille Scherrer,et al.  Souvenirs du monde des montagnes , 2009, SIGGRAPH Art Gallery.

[3]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[4]  Mark Fiala,et al.  ARTag, a fiducial marker system using digital techniques , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Vincent Lepetit,et al.  Fast Keypoint Recognition in Ten Lines of Code , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[7]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Vincent Lepetit,et al.  Point matching as a classification problem for fast and robust object pose estimation , 2004, CVPR 2004.

[9]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[10]  Ivan Poupyrev,et al.  Virtual object manipulation on a table-top AR environment , 2000, Proceedings IEEE and ACM International Symposium on Augmented Reality (ISAR 2000).

[11]  Vincent Lepetit,et al.  Monocular Model-Based 3D Tracking of Rigid Objects: A Survey , 2005, Found. Trends Comput. Graph. Vis..

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[14]  Dieter Schmalstieg,et al.  Pose tracking from natural features on mobile phones , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[15]  Dieter Schmalstieg,et al.  Multiple target detection and tracking with guaranteed framerates on mobile phones , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Stepán Obdrzálek,et al.  Sub-linear Indexing for Large Scale Object Recognition , 2005, BMVC.

[18]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[20]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[21]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[22]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[23]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.