Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications Using HyperMapper

In this paper we investigate an emerging application, 3D scene understanding, likely to be significant in the mobile space in the near future. The goal of this exploration is to reduce execution time while meeting our quality of result objectives. In previous work, we showed for the first time that it is possible to map this application to power constrained embedded systems, highlighting that decision choices made at the algorithmic design-level have the most significant impact. As the algorithmic design space is too large to be exhaustively evaluated, we use a previously introduced multi-objective random forest active learning prediction framework dubbed HyperMapper, to find good algorithmic designs. We show that HyperMapper generalizes on a recent cutting edge 3D scene understanding algorithm and on a modern GPU-based computer architecture. HyperMapper is able to beat an expert human hand-tuning the algorithmic parameters of the class of computer vision applications taken under consideration in this paper automatically. In addition, we use crowd-sourcing using a 3D scene understanding Android app to show that the Pareto front obtained on an embedded system can be used to accelerate the same application on all the 83 smart-phones and tablets with speedups ranging from 2x to over 12x.

[1]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Garrison W. Greenwood,et al.  A framework for user assisted design space exploration , 1999, DAC '99.

[3]  Luciano Lavagno,et al.  Metropolis: An Integrated Electronic System Design Environment , 2003, Computer.

[4]  Gabor Karsai,et al.  Constraint-Based Design-Space Exploration and Model Synthesis , 2003, EMSOFT.

[5]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[6]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[7]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[8]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[9]  Michael F. P. O'Boyle,et al.  Automatic performance model construction for the fast software exploration of new hardware designs , 2006, CASES '06.

[10]  Salman Khan,et al.  Using PredictiveModeling for Cross-Program Design Space Exploration in Multicore Systems , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[11]  Serge J. Belongie,et al.  SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[12]  Fouad Badran,et al.  YAO: A Software for Variational Data Assimilation Using Numerical Models , 2009, ICCSA.

[13]  Wolfram Schulte,et al.  An Approach for Effective Design Space Exploration , 2010, Monterey Workshop.

[14]  Michael F. P. O'Boyle,et al.  Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.

[15]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[16]  Silvio Savarese,et al.  MEVBench: A mobile computer vision benchmarking suite , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[17]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[18]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[20]  Gunter Saake,et al.  Predicting performance via automated feature-interaction detection , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[21]  Simon Moll Decompilation of LLVM IR , 2012 .

[22]  Andreas Krause,et al.  Active Learning for Multi-Objective Optimization , 2013, ICML.

[23]  Michael F. P. O'Boyle,et al.  A large-scale cross-architecture evaluation of thread-coarsening , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[24]  Prasanna Balaprakash,et al.  Active-learning-based surrogate models for empirical performance tuning , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[25]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[26]  Paul H. J. Kelly,et al.  Dense planar SLAM , 2014, 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[27]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Christos Papachristos,et al.  Aerial robotic tracking of a generalized mobile target employing visual and spatio-temporal dynamic subject perception , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Toon Goedemé,et al.  How to Choose the Best Embedded Processing Platform for on-Board UAV Image Processing ? , 2015, VISAPP.

[30]  Sven Apel,et al.  Performance-influence models for highly configurable systems , 2015, ESEC/SIGSOFT FSE.

[31]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[32]  Robin Deits,et al.  Continuous humanoid locomotion over uneven terrain using stereo fusion , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[33]  Michael F. P. O'Boyle,et al.  Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Roberto Cipolla,et al.  SceneNet: Understanding Real World Indoor Scenes With Synthetic Data , 2015, ArXiv.

[36]  Paul H. J. Kelly,et al.  Comparative design space exploration of dense and semi-dense SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Michael F. P. O'Boyle,et al.  Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[38]  Prasanna Balaprakash,et al.  Exploiting Performance Portability in Search Algorithms for Autotuning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[39]  Michael F. P. O'Boyle,et al.  Diplomat: Mapping of Multi-kernel Applications Using a Static Dataflow Abstraction , 2016, 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS).

[40]  Prasanna Balaprakash,et al.  AutoMOMML: Automatic Multi-objective Modeling with Machine Learning , 2016, ISC.

[41]  Paul H. J. Kelly,et al.  Application-oriented design space exploration for SLAM algorithms , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).