Edinburgh Research Explorer Navigating the Landscape for Real-time Localisation and Mapping for Robotics, Virtual and Augmented Reality

—Visual understanding of 3D environments in real- time, at low power, is a huge computational challenge. Often referred to as SLAM (Simultaneous Localisation and Mapping), it is central to applications spanning domestic and industrial robotics, autonomous vehicles, virtual and augmented reality. This paper describes the results of a major research effort to assemble the algorithms, architectures, tools, and systems software needed to enable delivery of SLAM, by supporting applications specialists in selecting and configuring the appro- priate algorithm and the appropriate hardware, and compilation pathway, to meet their performance, accuracy, and energy consumption goals. The major contributions we present are (1) tools and methodology for systematic quantitative evaluation of SLAM algorithms, (2) automated, machine-learning-guided exploration of the algorithmic and implementation design space with respect to multiple objectives, (3) end-to-end simulation tools to enable optimisation of heterogeneous, accelerated architectures for the specific algorithmic requirements of the various SLAM algorithmic approaches, and (4) tools for delivering, where appropriate, accelerated, adaptive SLAM solutions in a managed, JIT-compiled, adaptive runtime context.

[1]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[2]  Kunle Olukotun,et al.  Spatial: a language and compiler for application accelerators , 2018, PLDI.

[3]  Michael F. P. O'Boyle,et al.  SLAMBench2: Multi-Objective Head-to-Head Benchmarking for Visual SLAM , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Davide Scaramuzza,et al.  A Benchmark Comparison of Monocular Visual-Inertial Odometry Algorithms for Flying Robots , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Michail Papadimitriou,et al.  Towards practical heterogeneous virtual machines , 2018, Programming.

[6]  Foivos S. Zakkak,et al.  On the future of research VMs: a hardware/software perspective , 2018, Programming.

[7]  Michael F. P. O'Boyle,et al.  Automatic Matching of Legacy Code to Heterogeneous APIs: An Idiomatic Approach , 2018, ASPLOS.

[8]  Mohamed Abouzahir,et al.  Embedding SLAM algorithms: Has it come of age? , 2018, Robotics Auton. Syst..

[9]  Stefan Leutenegger,et al.  Efficient Octree-Based Volumetric SLAM Supporting Signed-Distance and Occupancy Mapping , 2018, IEEE Robotics and Automation Letters.

[10]  Mikel Luján,et al.  Type Information Elimination from Objects on Architectures with Tagged Pointers Support , 2018, IEEE Transactions on Computers.

[11]  Piotr Dudek,et al.  Visual Odometry for Pixel Processor Arrays , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Vivienne Sze,et al.  Designing Hardware for Machine Learning: The Important Role Played by Circuit Designers , 2017, IEEE Solid-State Circuits Magazine.

[13]  Paul H. J. Kelly,et al.  Application-oriented design space exploration for SLAM algorithms , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Luca Carlone,et al.  Visual-Inertial Odometry on Chip: An Algorithm-and-Hardware Co-design Approach , 2017, Robotics: Science and Systems.

[15]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[16]  Mikel Luján,et al.  Heterogeneous Managed Runtime Systems: A Computer Vision Case Study , 2017, VEE.

[17]  Mikel Luján,et al.  MaxSim: A simulation platform for managed applications , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[18]  Mikel Luján,et al.  The Potential of Dynamic Binary Modification and CPU-FPGA SoCs for Simulation , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[19]  Paul H. J. Kelly,et al.  Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications Using HyperMapper , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[20]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[21]  Gavin Brown,et al.  Boosting Java Performance Using GPGPUs , 2015, ARCS.

[22]  Michael F. P. O'Boyle,et al.  Selecting Heterogeneous Cores for Diversity , 2016, ACM Trans. Archit. Code Optim..

[23]  Nigel P. Topham,et al.  Edinburgh Research Explorer Cooperative Caching for GPUs , 2022 .

[24]  Stefan Leutenegger,et al.  Simultaneous Optical Flow and Intensity Estimation from an Event Camera , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Michael F. P. O'Boyle,et al.  Diplomat: Mapping of Multi-kernel Applications Using a Static Dataflow Abstraction , 2016, 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS).

[26]  I. Reid,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[27]  Nigel P. Topham,et al.  Characterizing memory bottlenecks in GPGPU workloads , 2016, 2016 IEEE International Symposium on Workload Characterization (IISWC).

[28]  Stefan Leutenegger,et al.  Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera , 2016, ECCV.

[29]  Charles T. Loop,et al.  A Closed-Form Bayesian Fusion Equation Using Occupancy Probabilities , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[30]  Michael F. P. O'Boyle,et al.  Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[31]  Christophe Dubach,et al.  Diversity: A Design Goal for Heterogeneous Processors , 2016, IEEE Computer Architecture Letters.

[32]  Björn Franke,et al.  Efficient asynchronous interrupt handling in a full-system instruction set simulator , 2016, LCTES.

[33]  Benoît Dupont de Dinechin,et al.  Optimal and fast throughput evaluation of CSDF , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[34]  Piotr Dudek,et al.  Parallel HDR tone mapping and auto-focus on a cellular processor array vision chip , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[35]  M. Luján,et al.  MAMBO: A Low-Overhead Dynamic Binary Modification Tool for ARM , 2016, ACM Trans. Archit. Code Optim..

[36]  Paul Keir DAGR: A DSL for Legacy OpenCL Codes: Porting SLAMBench KFusion to SYCL , 2016 .

[37]  Michael F. P. O'Boyle,et al.  Four Metrics to Evaluate Heterogeneous Multicores , 2015, ACM Trans. Archit. Code Optim..

[38]  Paul H. J. Kelly,et al.  Comparative design space exploration of dense and semi-dense SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Christoph W. Kessler,et al.  Smart Containers and Skeleton Programming for GPU-Based Systems , 2015, International Journal of Parallel Programming.

[40]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[41]  Olaf Kähler,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015, IEEE Transactions on Visualization and Computer Graphics.

[42]  Elnar Hajiyev,et al.  PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[43]  Björn Franke,et al.  Efficient dual-ISA support in a retargetable, asynchronous Dynamic Binary Translator , 2015, 2015 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[44]  Michael Glaß,et al.  Throughput-optimizing Compilation of Dataflow Applications for Multi-Cores using Quasi-Static Scheduling , 2015, SCOPES.

[45]  Michael Bosse,et al.  Keyframe-based visual–inertial odometry using nonlinear optimization , 2015, Int. J. Robotics Res..

[46]  Michael F. P. O'Boyle,et al.  Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Christoph W. Kessler,et al.  Improving Energy-Efficiency of Static Schedules by Core Consolidation and Switching Off Unused Cores , 2015, PARCO.

[48]  Roland Siegwart,et al.  Kinect v2 for mobile robot navigation: Evaluation and modeling , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[49]  Christoph W. Kessler,et al.  Performance-aware composition framework for GPU-based systems , 2014, The Journal of Supercomputing.

[50]  Christoph W. Kessler,et al.  Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Manycore Systems , 2015, ACM Trans. Archit. Code Optim..

[51]  Davide Scaramuzza,et al.  Event-based, 6-DOF pose tracking for high-speed maneuvers , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[52]  Michael F. P. O'Boyle,et al.  A compiler framework for automatically mapping data parallel programs to heterogeneous MPSoCs , 2014, 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[53]  Björn Franke,et al.  Automated ISA branch coverage analysis and test case generation for retargetable instruction set simulators , 2014, 2014 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[54]  Davide Scaramuzza,et al.  SVO: Fast semi-direct monocular visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[55]  Davide Scaramuzza,et al.  Low-latency event-based visual odometry , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[56]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[57]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[58]  Michael F. P. O'Boyle,et al.  Measuring flexibility in single-ISA heterogeneous processors , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[59]  Michael F. P. O'Boyle,et al.  Partitioning data-parallel programs for heterogeneous MPSoCs: time and energy design space exploration , 2014, LCTES '14.

[60]  Björn Franke,et al.  Efficient code generation in a region-based dynamic binary translator , 2014, LCTES '14.

[61]  Siddhartha S. Srinivasa,et al.  Informed RRT*: Optimal sampling-based path planning focused via direct sampling of an admissible ellipsoidal heuristic , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[62]  Ryad Benosman,et al.  Simultaneous Mosaicing and Tracking with an Event Camera , 2014, BMVC.

[63]  Christoforos E. Kozyrakis,et al.  ZSim: fast and accurate microarchitectural simulation of thousand-core systems , 2013, ISCA.

[64]  Bin Wang,et al.  A 100,000 fps vision sensor with embedded 535GOPS/W 256×256 SIMD processor array , 2013, 2013 Symposium on VLSI Circuits.

[65]  Björn Franke,et al.  Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[66]  Michael Haupt,et al.  Maxine: An approachable virtual machine for, and in, java , 2013, TACO.

[67]  Sameer Kulkarni,et al.  Mitigating the compiler optimization phase-ordering problem using machine learning , 2012, OOPSLA '12.

[68]  Fanxin Kong,et al.  Energy Minimizing for Parallel Real-Time Tasks Based on Level-Packing , 2012, 2012 IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[69]  Zhiyong Liu,et al.  An effective approximation algorithm for the Malleable Parallel Task Scheduling problem , 2012, J. Parallel Distributed Comput..

[70]  Edwin V. Bonilla,et al.  Predicting best design trade-offs: A case study in processor customization , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[71]  Björn Franke,et al.  Cooperative partitioning: Energy-efficient cache partitioning for high-performance CMPs , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[72]  Sameer Kulkarni,et al.  A transactional memory with automatic performance tuning , 2012, TACO.

[73]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[74]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[75]  Wancheng Zhang,et al.  A Programmable Vision Chip Based on Multiple Levels of Parallel Processors , 2011, IEEE Journal of Solid-State Circuits.

[76]  Igor Böhm,et al.  Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator , 2011, PLDI '11.

[77]  Oscar Almer,et al.  A Learning-Based Approach to the Automated Design of MPSoC Networks , 2011, ARCS.

[78]  Michael F. P. O'Boyle,et al.  Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.

[79]  Igor Böhm,et al.  Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[80]  Wolfram Schulte,et al.  An Approach for Effective Design Space Exploration , 2010, Monterey Workshop.

[81]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[82]  David W. Murray,et al.  Parallel Tracking and Mapping on a camera phone , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[83]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[84]  Ari Paasio,et al.  MIPA4k: A 64×64 cell mixed-mode image processor array , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[85]  Michael F. P. O'Boyle,et al.  Automatic Feature Generation for Machine Learning Based Optimizing Compilation , 2009, 2009 International Symposium on Code Generation and Optimization.

[86]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[88]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[89]  Piotr Dudek,et al.  A general-purpose processor-per-pixel analog SIMD vision chip , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[90]  Derek Bruening,et al.  Efficient, transparent, and comprehensive runtime code manipulation , 2004 .

[91]  Ronald Parr,et al.  DP-SLAM: fast, robust simultaneous localization and mapping without predetermined landmarks , 2003, IJCAI 2003.

[92]  S. Espejo,et al.  Architectural and Basic Circuit Considerations for a Flexible 128 × 128 Mixed-Signal SIMD Vision Chip , 2002 .

[93]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[94]  Ángel Rodríguez-Vázquez,et al.  A 0.8-μm CMOS two-dimensional programmable mixed-signal focal-plane array processor with on-chip binary imaging and instructions storage , 1997, IEEE J. Solid State Circuits.