Coresets for visual summarization with applications to loop closure

In continuously operating robotic systems, efficient representation of the previously seen camera feed is crucial. Using a highly efficient compression coreset method, we formulate a new method for hierarchical retrieval of frames from large video streams collected online by a moving robot. We demonstrate how to utilize the resulting structure for efficient loop-closure by a novel sampling approach that is adaptive to the structure of the video. The same structure also allows us to create a highly-effective search tool for large-scale videos, which we demonstrate in this paper. We show the efficiency of proposed approaches for retrieval and loop closure on standard datasets, and on a large-scale video from a mobile camera.

[1]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[2]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[3]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[4]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[5]  Matthew J. Barth,et al.  Qualitative route scene description using autonomous landmark detection , 1990, [1990] Proceedings Third International Conference on Computer Vision.

[6]  Shigang Li,et al.  Selecting distinctive scene features for landmarks , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[7]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[8]  Richard Szeliski,et al.  Visual odometry and map correlation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Paul Newman,et al.  SLAM-Loop Closing with Visually Salient Features , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[11]  Andrew Zisserman,et al.  Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[12]  Frank Dellaert,et al.  Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing , 2006, Int. J. Robotics Res..

[13]  Patricia Ladret,et al.  The blur effect: perception and estimation with a new no-reference perceptual blur metric , 2007, Electronic Imaging.

[14]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[16]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Kostas Daniilidis,et al.  Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection , 2009, NIPS.

[18]  Paul Newman,et al.  FAB-MAP 3D: Topological mapping with spatial and visual appearance , 2010, 2010 IEEE International Conference on Robotics and Automation.

[19]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[20]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[21]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[22]  Gordon Wyeth,et al.  Towards persistent indoor appearance-based localization, mapping and navigation using CAT-Graph , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  F. Michaud,et al.  Appearance-Based Loop Closure Detection for Online Large-Scale and Long-Term Operation , 2013, IEEE Transactions on Robotics.

[24]  John J. Leonard,et al.  Temporally scalable visual SLAM using a reduced pose graph , 2013, 2013 IEEE International Conference on Robotics and Automation.

[25]  Paul Newman,et al.  Visual precis generation using coresets , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[26]  John W. Fisher,et al.  Coresets for k-Segmentation of Streaming Data , 2014, NIPS.