On-Vehicle Videos Localization Using Geometric and Spatio-Temporal Information

Recently, a number of researches are conducted to construct the actual city into computers for the purpose of web services, intelligent transportation systems (ITS), disaster analysis, landscape simulations and so on. Further, with the spread of on-vehicle video cameras, it becomes common to share the on-vehicle video on website. If locations of the videos are available, the data can be efficiently used for virtual city construction. In this paper, the authors propose a method to realize localization of anonymous on-vehicle videos uploaded on the web by using video matching technique with Temporal Height Image (THI), Affine SIFT and Bag of Feature (BoF). THI retains information of relative building heights from temporal image sequences and the Affine SIFT realizes a robust matching for variance of both camera speed and driving lane. Finally, BoF representation allows the authors to realize a stable matching with less computational cost. The authors conducted several experiments using real image sequences of the actual city to show the successful results of the proposed method.

[1]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[4]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[5]  AnguelovDragomir,et al.  Google Street View , 2010 .

[6]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[7]  Marc Pollefeys,et al.  Leveraging 3D City Models for Rotation Invariant Place-of-Interest Recognition , 2011, International Journal of Computer Vision.

[8]  Jun Miura,et al.  Robust view matching-based Markov localization in outdoor environments , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Jean-Michel Morel,et al.  ASIFT: A New Framework for Fully Affine Invariant Image Comparison , 2009, SIAM J. Imaging Sci..

[10]  Paul Newman,et al.  FAB-MAP: Appearance-Based Place Recognition and Mapping using a Learned Visual Vocabulary Model , 2010, ICML.

[11]  Jianxiong Xiao,et al.  Multiple view semantic segmentation for street view images , 2009, 2009 IEEE 12th International Conference on Computer Vision.