Spatio-temporal feature-based keyframe detection from video shots using spectral clustering

Keyframe detection is a fundamental component in approaches for large-scale mapping and scene recognition. Assuming that the detection is applied to a set of continuously captured frames, this paper presents a keyframe detector that not only considers the frame content to quantify appearance changes on the sequence, but also the temporal accumulation of evidence. If frames are described as a set of local features, our algorithm proposes a unified framework for comparing local features acquired from consecutive frames by the building of an auxiliary graph-based on the locality of features. Spectral clustering is then employed to obtain tentative graph partitions. Validated partitions will be associated to keyframes. It should be noted that the approach does not need to estimate the motion of the camera, and that the similarity measure defined within this framework can be used for any sort of feature. Experimental results using different types of visual features show the strength of our representation. Moreover, an evaluation methodology has been defined for the quantitative comparison of our keyframe detector against other similar approaches.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Yihong Gong,et al.  Incremental spectral clustering by efficiently updating the eigen-system , 2010, Pattern Recognit..

[4]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[5]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[6]  Francisco Angel Moreno,et al.  A collection of outdoor robotic datasets with centimeter-accuracy ground truth , 2009, Auton. Robots.

[7]  Tetsuya Yoshida,et al.  A graph model for mutual information based clustering , 2011, Journal of Intelligent Information Systems.

[8]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[9]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[10]  Georgios Tziritas,et al.  Equivalent Key Frames Selection Based on Iso-Content Principles , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  In-So Kweon,et al.  Robust feature matching for loop closing and localization , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Javier Civera,et al.  Inverse Depth Parametrization for Monocular SLAM , 2008, IEEE Transactions on Robotics.

[13]  Jean-Marc Odobez,et al.  Video Shot Clustering using Spectral Methods , 2003 .

[14]  José A. Castellanos,et al.  Linear time vehicle relocation in SLAM , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[15]  Kurt Konolige,et al.  Towards lifelong visual maps , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[18]  Bo Li,et al.  Keyframe detection for appearance-based visual SLAM , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Francisco Sandoval Hernández,et al.  A novel approach for salient image regions detection and description , 2009, Pattern Recognit. Lett..

[20]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..