Hyperdimensional computing as a framework for systematic aggregation of image descriptors

Image and video descriptors are an omnipresent tool in computer vision and its application fields like mobile robotics. Many hand-crafted and in particular learned image descriptors are numerical vectors with a potentially (very) large number of dimensions. Practical considerations like memory consumption or time for comparisons call for the creation of compact representations. In this paper, we use hyperdimensional computing (HDC) as an approach to systematically combine information from a set of vectors in a single vector of the same dimensionality. HDC is a known technique to perform symbolic processing with distributed representation in numerical vectors with thousands of dimensions. We present a HDC implementation that is suitable for processing the output of existing and future (deep-learning based) image descriptors. We discuss how this can be used as a framework to process descriptors together with additional knowledge by simple and fast vector operations. A concrete outcome is a novel HDCbased approach to aggregate a set of local image descriptors together with their image positions in a single holistic descriptor. The comparison to available holistic descriptors and aggregation methods on a series of standard mobile robotics place recognition experiments shows a 20% improvement in average performance compared to runnerup and 3.6x better worst-case performance.

[1]  Torsten Sattler,et al.  Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Peer Neubert,et al.  Beyond Holistic Descriptors, Keypoints, and Fixed Patches: Multiscale Superpixel Grids for Place Recognition in Changing Environments , 2016, IEEE Robotics and Automation Letters.

[3]  Peter Protzel,et al.  A Neurologically Inspired Sequence Processing Model for Mobile Robot Place Recognition , 2019, IEEE Robotics and Automation Letters.

[4]  Gordon Wyeth,et al.  FAB-MAP + RatSLAM: Appearance-based SLAM for multiple times of day , 2010, 2010 IEEE International Conference on Robotics and Automation.

[5]  Peer Neubert,et al.  Superpixels and their Application for Visual Place Recognition in Changing Environments , 2015 .

[6]  Yannis Avrithis,et al.  Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images , 2016, International Journal of Computer Vision.

[7]  Pentti Kanerva,et al.  Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors , 2009, Cognitive Computation.

[8]  Xuemin Lin,et al.  Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement , 2016, IEEE Transactions on Knowledge and Data Engineering.

[9]  Andrew Zisserman,et al.  Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[10]  Brett Browning,et al.  Visual place recognition using HMM sequence matching , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Chenglu Wen,et al.  RF-Net: An End-To-End Image Matching Network Based on Receptive Field , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Niko Sünderhauf,et al.  Are We There Yet? Challenging SeqSLAM on a 3000 km Journey Across All Four Seasons , 2013 .

[13]  Alex Graves,et al.  Associative Long Short-Term Memory , 2016, ICML.

[14]  Chris Eliasmith,et al.  How to build a brain: from function to implementation , 2007, Synthese.

[15]  Achim J. Lilienthal,et al.  SIFT, SURF and Seasons: Long-term Outdoor Localization Using Local Features , 2007, EMCR.

[16]  Nikolaos Papakonstantinou,et al.  Fault detection in the hyperspace: Towards intelligent automation systems , 2015, 2015 IEEE 13th International Conference on Industrial Informatics (INDIN).

[17]  Terrence C. Stewart,et al.  A neural representation of continuous space using fractional binding , 2019, CogSci.

[18]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[19]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Bruno A. Olshausen,et al.  Superposition of many models into one , 2019, NeurIPS.

[23]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Description and Detection of Local Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[25]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[26]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Long Quan,et al.  ASLFeat: Learning Local Features of Accurate Shape and Localization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Friedrich T. Sommer,et al.  A Theory of Sequence Indexing and Working Memory in Recurrent Neural Networks , 2018, Neural Computation.

[30]  Giorgos Tolias,et al.  Learning and aggregating deep local descriptors for instance-level recognition , 2020, ECCV.

[31]  Ross W. Gayler Vector Symbolic Architectures answer Jackendoff's challenges for cognitive neuroscience , 2004, ArXiv.

[32]  Ross W. Gayler,et al.  Multiplicative Binding, Representation Operators & Analogy , 1998 .

[33]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[34]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[35]  Peer Neubert,et al.  Unsupervised Learning Methods for Visual Place Recognition in Discretely and Continuously Changing Environments , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Alexander Legalov,et al.  Associative synthesis of finite state automata model of a controlled object with hyperdimensional computing , 2017, IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society.

[37]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Pentti Kanerva Computing with 10,000-bit words , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[39]  Trevor Cohen,et al.  Reasoning with vectors: A continuous model for fast robust inference , 2015, Log. J. IGPL.

[40]  Evgeny Osipov,et al.  Imitation of honey bees’ concept learning processes using Vector Symbolic Architectures , 2015, BICA 2015.

[41]  Peer Neubert,et al.  An Introduction to Hyperdimensional Computing for Robotics , 2019, KI - Künstliche Intelligenz.

[42]  Martin Humenberger,et al.  R2D2: Reliable and Repeatable Detector and Descriptor , 2019, NeurIPS.

[43]  Jan M. Rabaey,et al.  Classification and Recall With Binary Hyperdimensional Computing: Tradeoffs in Choice of Density and Mapping Characteristics , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[45]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[46]  Bingyi Cao,et al.  Unifying Deep Local and Global Features for Image Search , 2020, ECCV.

[47]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[48]  José Camacho-Collados,et al.  From Word to Sense Embeddings: A Survey on Vector Representations of Meaning , 2018, J. Artif. Intell. Res..

[49]  Michael Milford,et al.  Deep learning features at scale for visual place recognition , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Miroslaw Bober,et al.  REMAP: Multi-Layer Entropy-Guided Pooling of Dense CNN Features for Image Retrieval , 2019, IEEE Transactions on Image Processing.

[51]  Peer Neubert,et al.  A comparison of vector symbolic architectures , 2020, Artificial Intelligence Review.

[52]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[53]  Geoffrey E. Hinton,et al.  Distributed representations and nested compositional structure , 1994 .

[54]  Masatoshi Okutomi,et al.  24/7 Place Recognition by View Synthesis , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Matthew Gadd,et al.  Real-time Kinematic Ground Truth for the Oxford RobotCar Dataset , 2020, ArXiv.

[57]  Qingming Huang,et al.  Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[58]  Jens Wawerla,et al.  The SFU Mountain Dataset : Semi-Structured Woodland Trails Under Changing Environmental Conditions , 2015 .

[59]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[60]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Pentti Kanerva,et al.  Fully Distributed Representation , 1997 .

[62]  Takeo Kanade,et al.  Visual topometric localization , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[63]  Dmitri A. Rachkovskij,et al.  SIMILARITY‐BASED RETRIEVAL WITH STRUCTURE‐SENSITIVE SPARSE BINARY DISTRIBUTED REPRESENTATIONS , 2012, Comput. Intell..

[64]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[65]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.