Boosting descriptors condensed from video sequences for place recognition

We investigate the task of efficiently training classifiers to build a robust place recognition system. We advocate an approach which involves densely capturing the facades of buildings and landmarks with video recordings to greedily accumulate as much visual information as possible. Our contributions include (1) a preprocessing step to effectively exploit the temporal continuity intrinsic in the video sequences to dramatically increase training efficiency, (2) training sparse classifiers discriminatively with the resulting data using the AdaBoost principle for place recognition, and (3) methods to speed up recognition using scaled kd-trees and to perform geometric validation on the results. Compared to straightforwardly applying scene recognition methods, our method not only allows a much faster training phase, the resulting classifiers are also more accurate. The sparsity of the classifiers also ensures good potential for recognition at high frame rates. We show extensive experimental results to validate our claims.

[1]  Allen Gersho,et al.  Fast search algorithms for vector quantization and pattern matching , 1984, ICASSP.

[2]  Mubarak Shah,et al.  Where was the Picture Taken: Image Localization in Route Panoramas Using Epipolar Geometry , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[3]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Sang Wook Lee,et al.  ICP Registration Using Invariant Features , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[8]  Lucas Paletta,et al.  A Mobile Vision System for Urban Detection with Informative Local Descriptors , 2006, Fourth IEEE International Conference on Computer Vision Systems (ICVS'06).

[9]  Andrew Zisserman,et al.  Video data mining using configurations of viewpoint invariant regions , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).