FutureMapping: The Computational Structure of Spatial AI Systems

We discuss and predict the evolution of Simultaneous Localisation and Mapping (SLAM) into a general geometric and semantic `Spatial AI' perception capability for intelligent embodied devices. A big gap remains between the visual perception performance that devices such as augmented reality eyewear or comsumer robots will require and what is possible within the constraints imposed by real products. Co-design of algorithms, processors and sensors will be needed. We explore the computational structure of current and future Spatial AI algorithms and consider this within the landscape of ongoing hardware developments.

[1]  Frank Dellaert,et al.  Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing , 2006, Int. J. Robotics Res..

[2]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andrew J. Davison,et al.  Real-Time Camera Tracking: When is High Frame-Rate Best? , 2012, ECCV.

[4]  Steve B. Furber,et al.  The SpiNNaker Project , 2014, Proceedings of the IEEE.

[5]  Steven Lovegrove,et al.  Parametric dense visual SLAM , 2011 .

[6]  Dieter Fox,et al.  Self-Supervised Visual Descriptor Learning for Dense Correspondence , 2017, IEEE Robotics and Automation Letters.

[7]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Frank Arens,et al.  [Welcome to the Jungle]. , 2014, Pflege Zeitschrift.

[9]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Stefan Leutenegger,et al.  Semantic Texture for Robust Dense Tracking , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[11]  Dieter Fox,et al.  DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks , 2017, Robotics: Science and Systems.

[12]  Randy H. Katz,et al.  A Berkeley View of Systems Challenges for AI , 2017, ArXiv.

[13]  Michael Bosse,et al.  Keyframe-based visual–inertial odometry using nonlinear optimization , 2015, Int. J. Robotics Res..

[14]  Stefan Leutenegger,et al.  SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[16]  Kurt Konolige,et al.  Sparse Sparse Bundle Adjustment , 2010, BMVC.

[17]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[18]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[19]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[21]  Paul H. J. Kelly,et al.  Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications Using HyperMapper , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[22]  Michael F. P. O'Boyle,et al.  SLAMBench2: Multi-Objective Head-to-Head Benchmarking for Visual SLAM , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[25]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Kurt Konolige,et al.  FrameSLAM: From Bundle Adjustment to Real-Time Visual Mapping , 2008, IEEE Transactions on Robotics.

[28]  Judea Pearl,et al.  Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution , 2018, WSDM.

[29]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[30]  Horst Bischof,et al.  Real-time Computation of Variational Methods on Graphics Hardware , 2007 .

[31]  Stefan Leutenegger,et al.  CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Andrew J. Davison,et al.  Active search for real-time vision , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[35]  Andrew J. Davison,et al.  Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[36]  Stefan Leutenegger,et al.  Monocular, Real-Time Surface Reconstruction Using Dynamic Level of Detail , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[37]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Andrew J. Davison,et al.  Automatically and efficiently inferring the hierarchical structure of visual maps , 2009, 2009 IEEE International Conference on Robotics and Automation.

[39]  J. Pearl Theoretical Impediments to Machine Learning A position paper , 2016 .

[40]  Ian D. Reid,et al.  RSLAM: A System for Large-Scale Mapping in Constant-Time Using Stereo , 2011, International Journal of Computer Vision.

[41]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[43]  Dana H. Ballard,et al.  Animate Vision , 1991, Artif. Intell..

[44]  Viorica Patraucean,et al.  gvnn: Neural Network Library for Geometric Computer Vision , 2016, ECCV Workshops.

[45]  Wolfram Burgard,et al.  OctoMap : A Probabilistic , Flexible , and Compact 3 D Map Representation for Robotic Systems , 2010 .

[46]  Sen Wang,et al.  DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[47]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[48]  Michael F. P. O'Boyle,et al.  Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[49]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[50]  Richard A. Newcombe,et al.  Dense visual SLAM , 2012 .

[51]  Paul H. J. Kelly,et al.  Application-oriented design space exploration for SLAM algorithms , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[52]  John Markoff,et al.  Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots , 2015 .

[53]  Lourdes Agapito,et al.  Co-fusion: Real-time segmentation, tracking and fusion of multiple objects , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[55]  Stefan Leutenegger,et al.  SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Frank Dellaert,et al.  iSAM2: Incremental smoothing and mapping using the Bayes tree , 2012, Int. J. Robotics Res..

[57]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[58]  Piotr Dudek,et al.  Vision Chips with In-pixel Processors for High-performance Low-power Embedded Vision Systems , 2016 .

[59]  Stefan Leutenegger,et al.  Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera , 2016, ECCV.