Incremental image set querying based localization

Image based localization has been developed for many applications such as mobile localization, auto-navigation, augmented reality and photo tourism. When the querying image is matched against a pre-built 3D feature point cloud, its pose can be estimated for future use. However, when the querying image is distant from the pre-built 3D point cloud, conventional single image-based localization method will fail. To address this problem, we present an incremental image set querying based localization framework. When single image localization fails, the system will incrementally ask the user to input more auxiliary images until the localization is successful and stable. The main idea is that image set, instead of single image, is matched against the pre-built 3D point cloud to meet the challenge. Next the image set is incrementally enlarged and aggregated to form a local 3D model. Compared with single image querying based localization method, the querying 3D model contains more information and geometry constraints which are essential for localization. Experiments have demonstrated the effectiveness and feasibility of the proposed framework.

[1]  Nicu Sebe,et al.  Egocentric Daily Activity Recognition via Multitask Clustering , 2015, IEEE Transactions on Image Processing.

[2]  Jiwen Lu,et al.  Regularized Locality Preserving Projections and Its Extensions for Face Recognition , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[4]  Yan Yan,et al.  $L_{1}$ -Norm Low-Rank Matrix Factorization by Variational Bayesian Method , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Gang Wang,et al.  Discriminative multi-manifold analysis for face recognition from a single training sample per person , 2011, 2011 International Conference on Computer Vision.

[6]  David W. Murray,et al.  Video-rate localization in multiple maps for wearable augmented reality , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[7]  Gang Wang,et al.  Human Identity and Gender Recognition From Gait Sequences With Arbitrary Walking Directions , 2014, IEEE Transactions on Information Forensics and Security.

[8]  Subramanian Ramanathan,et al.  No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[10]  Nicu Sebe,et al.  GLocal tells you more: Coupling GLocal structural for feature selection with sparsity for image and video classification , 2014, Comput. Vis. Image Underst..

[11]  Dieter Schmalstieg,et al.  Global Localization from Monocular SLAM on a Mobile Phone , 2014, IEEE Transactions on Visualization and Computer Graphics.

[12]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[14]  Jean Ponce What is a camera? , 2009, CVPR.

[15]  Jiwen Lu,et al.  Cost-Sensitive Subspace Analysis and Extensions for Face Recognition , 2013, IEEE Transactions on Information Forensics and Security.

[16]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[17]  Andrew Zisserman,et al.  Multiple View Geometry , 1999 .

[18]  Nicu Sebe,et al.  Multi-task linear discriminant analysis for multi-view action recognition , 2013, 2013 IEEE International Conference on Image Processing.

[19]  Jiwen Lu,et al.  Learning Compact Binary Face Descriptor for Face Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  A FischlerMartin,et al.  Random sample consensus , 1981 .

[21]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[23]  Zuzana Kukelova,et al.  Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[26]  Nicu Sebe,et al.  Event Oriented Dictionary Learning for Complex Event Detection , 2015, IEEE Transactions on Image Processing.

[27]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[28]  Zuzana Kukelova,et al.  A general solution to the P4P problem for camera with unknown focal length , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.