Improved Image-Based Localization Using SFM and Modified Coordinate System Transfer

Accurate localization of mobile devices based on camera-acquired visual media information usually requires a search over a very large GPS-referenced image database collected from social sharing websites like Flickr or services such as Google Street View. This paper proposes a new method for reliable estimation of the actual query camera location by optimally utilizing structure from motion (SFM) for three-dimensional (3-D) camera position reconstruction, and introducing a new approach for applying a linear transformation between two different 3-D Cartesian coordinate systems. Since the success of SFM hinges on effectively selecting among the multiple retrieved images, we propose an optimization framework to do this using the criterion of the highest intraclass similarity among images returned from retrieval pipeline to increase SFM convergence rate. The selected images along with the query are then used to reconstruct a 3-D scene and find the relative camera positions by employing SFM. In the last processing step, an effective camera coordinate transformation algorithm is introduced to estimate the query's geo-tag. The influence of the number of images involved in SFM on the ultimate position error is investigated by examining the use of three and four dataset images with different solution for calculating the query world coordinates. We have evaluated our proposed method on query images with known accurate ground truth. Experimental results are presented to demonstrate that our method outperforms other reported methods in terms of average error.

[1]  Rashid Ansari,et al.  Improved Image Retrieval for Efficient Localization in Urban Areas Using Location Uncertainty Data , 2016, 2016 IEEE International Symposium on Multimedia (ISM).

[2]  Yanxi Liu,et al.  Deformed Lattice Detection in Real-World Images Using Mean-Shift Belief Propagation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using orthonormal matrices , 1988 .

[4]  Bernd Girod,et al.  Rotation invariant fast features for large-scale recognition , 2012, Other Conferences.

[5]  C. V. Jawahar,et al.  Accurate localization by fusing images and GPS signals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Lionel Moisan,et al.  Automatic Homographic Registration of a Pair of Images, with A Contrario Elimination of Outliers , 2012, Image Process. Line.

[7]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[8]  Nenghai Yu,et al.  AMIGO: accurate mobile image geotagging , 2012, ICIMCS '12.

[9]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[10]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[11]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[12]  Changsheng Xu,et al.  Interaction Design for Mobile Visual Search , 2013, IEEE Transactions on Multimedia.

[13]  Changsheng Xu,et al.  Mobile Landmark Search with 3D Models , 2014, IEEE Transactions on Multimedia.

[14]  Mahdi Salarian,et al.  Accurate localization in dense urban area using Google street view images , 2014, 2015 SAI Intelligent Systems Conference (IntelliSys).

[15]  Yongdong Zhang,et al.  Supervised Hash Coding With Deep Neural Network for Environment Perception of Intelligent Vehicles , 2018, IEEE Transactions on Intelligent Transportation Systems.

[16]  Torsten Sattler,et al.  Image Retrieval for Image-Based Localization Revisited , 2012, BMVC.

[17]  Masatoshi Okutomi,et al.  Visual Place Recognition with Repetitive Structures , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .

[19]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[21]  Mubarak Shah,et al.  GPS-Tag Refinement Using Random Walks with an Adaptive Damping Factor , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Toshihide Ibaraki,et al.  An Algorithm for Finding K Minimum Spanning Trees , 1981, SIAM J. Comput..

[23]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[24]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[26]  Steven M. Seitz,et al.  Multicore bundle adjustment , 2011, CVPR 2011.

[27]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[28]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[29]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Anton van den Hengel,et al.  Spatially aware feature selection and weighting for object retrieval , 2013, Image Vis. Comput..

[33]  K. K. More,et al.  Interactive Multimodal Visual Search on Mobile Device , 2015 .

[34]  James M. Rehg,et al.  Learning Query-Specific Distance Functions for Large-Scale Web Image Search , 2013, IEEE Transactions on Multimedia.

[35]  Avideh Zakhor,et al.  Location-based image retrieval for urban environments , 2011, 2011 18th IEEE International Conference on Image Processing.

[36]  Rashid Ansari,et al.  Accurate Image Based Localization by Applying SFM and Coordinate System Registration , 2016, 2016 IEEE International Symposium on Multimedia (ISM).

[37]  Matteo Fischetti,et al.  Weighted k-cardinality trees: Complexity and polyhedral structure , 1994, Networks.

[38]  Rongrong Ji,et al.  Active query sensing for mobile location search , 2011, ACM Multimedia.

[39]  Tomás Pajdla,et al.  Learning and Calibrating Per-Location Classifiers for Visual Place Recognition , 2013, International Journal of Computer Vision.

[40]  Junqing Yu,et al.  On-Device Mobile Visual Location Recognition by Integrating Vision and Inertial Sensors , 2013, IEEE Transactions on Multimedia.

[41]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[42]  Tao Mei,et al.  Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing , 2012, ACM Multimedia.

[43]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[44]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[45]  Xiaogang Wang,et al.  6-DOF Image Localization From Massive Geo-Tagged Reference Images , 2016, IEEE Transactions on Multimedia.

[46]  Ian H. Witten,et al.  Compressing and indexing documents and images , 1999 .

[47]  Tom Drummond,et al.  Initialisation for Visual Tracking in Urban Environments , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[48]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.