Heterogeneous Managed Runtime Systems: A Computer Vision Case Study

Real-time 3D space understanding is becoming prevalent across a wide range of applications and hardware platforms. To meet the desired Quality of Service (QoS), computer vision applications tend to be heavily parallelized and exploit any available hardware accelerators. Current approaches to achieving real-time computer vision, evolve around programming languages typically associated with High Performance Computing along with binding extensions for OpenCL or CUDA execution. Such implementations, although high performing, lack portability across the wide range of diverse hardware resources and accelerators. In this paper, we showcase how a complex computer vision application can be implemented within a managed runtime system. We discuss the complexities of achieving high-performing and portable execution across embedded and desktop configurations. Furthermore, we demonstrate that it is possible to achieve the QoS target of over 30 frames per second (FPS) by exploiting FPGA and GPGPU acceleration transparently through the managed runtime system.

[1]  Michael Klemm,et al.  JCudaMP: OpenMP/Java on CUDA , 2010, IWMSE '10.

[2]  L. Miles,et al.  2000 , 2000, RDH.

[3]  Vivek Sarkar,et al.  Accelerating Habanero-Java programs with OpenCL generation , 2013, PPPJ.

[4]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Kurt Keutzer,et al.  Copperhead: compiling an embedded data parallel language , 2011, PPoPP '11.

[6]  Michel Steuwer,et al.  A Composable Array Function Interface for Heterogeneous Computing in Java , 2014, ARRAY@PLDI.

[7]  Michael Haupt,et al.  Maxine: An approachable virtual machine for, and in, java , 2013, TACO.

[8]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[9]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[10]  References , 1971 .

[11]  Sylvain Henry,et al.  ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures , 2013, FHPC '13.

[12]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[13]  Arvind Bluespec: A language for hardware design, simulation, synthesis and verification Invited Talk , 2003, MEMOCODE.

[14]  Tatiana Shpeisman,et al.  River trail: a path to parallelism in JavaScript , 2013, OOPSLA.

[15]  Gavin Brown,et al.  Boosting Java Performance Using GPGPUs , 2015, ARCS.

[16]  Joshua S. Auerbach,et al.  Lime: a Java-compatible and synthesizable language for heterogeneous architectures , 2010, OOPSLA.

[17]  Daniel D. Lee,et al.  The University of Pennsylvania MAGIC 2010 multi‐robot unmanned vehicle system , 2012, J. Field Robotics.

[18]  Pramodita Sharma 2012 , 2013, Les 25 ans de l’OMC: Une rétrospective en photos.

[19]  Philip C. Pratt-Szeliga,et al.  Rootbeer: Seamlessly Using GPUs from Java , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[20]  Geoffrey Mainland,et al.  Nikola: embedding compiled GPU functions in Haskell , 2010 .

[21]  Michael F. P. O'Boyle,et al.  Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Zhengyou Zhang,et al.  Iterative point matching for registration of free-form curves and surfaces , 1994, International Journal of Computer Vision.

[23]  Wojciech Zaremba,et al.  JaBEE: framework for object-oriented Java bytecode compilation and execution on graphics processor units , 2012, GPGPU-5.

[24]  Dennis Shasha,et al.  Parakeet: a just-in-time parallel accelerator for python , 2012, HotPar'12.

[25]  Hanspeter Mössenböck,et al.  Partial Escape Analysis and Scalar Replacement for Java , 2014, CGO '14.

[26]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Sebastian Werner,et al.  Project Beehive: A Hardware/Software Co-designed Stack for Runtime and Architectural Research , 2015, ECOOP 2015.

[28]  Nathaniel Nystrom,et al.  Firepile: run-time compilation for GPUs in scala , 2011, GPCE '11.

[29]  Nicolas Pinto,et al.  PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation , 2009, Parallel Comput..

[30]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[31]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[32]  Vivek Sarkar,et al.  JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA , 2009, Euro-Par.

[33]  Manuel M. T. Chakravarty,et al.  Accelerating Haskell array codes with multicore GPUs , 2011, DAMP '11.