VizGen: accelerating visual computing prototypes in dynamic languages

This paper introduces a novel domain-specific compiler, which translates visual computing programs written in dynamic languages to highly efficient code. We define "dynamic" languages as those such as Python and MATLAB, which feature dynamic typing and flexible array operations. Such language features can be useful for rapid prototyping, however, the dynamic computation model introduces significant overheads in program execution time. We introduce a compiler framework for accelerating visual computing programs, such as graphics and vision programs, written in generalpurpose dynamic languages. Our compiler allows substantial performance gains (frequently orders of magnitude) over general compilers for dynamic languages by specializing the compiler for visual computation. Specifically, our compiler takes advantage of three key properties of visual computing programs, which permit optimizations: (1) many array data structures have small, constant, or bounded size, (2) many operations on visual data are supported in hardware or are embarrassingly parallel, and (3) humans are not sensitive to small numerical errors in visual outputs due to changing floating-point precisions. Our compiler integrates program transformations that have been described previously, and improves existing transformations to handle visual programs that perform complicated array computations. In particular, we show that dependent type analysis can be used to infer sizes and guide optimizations for many small-sized array operations that arise in visual programs. Programmers who are not experts on visual computation can use our compiler to produce more efficient Python programs than if they write manually parallelized C, with fewer lines of application logic.

[1]  P. Sadayappan,et al.  High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.

[2]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[3]  Alan Edelman,et al.  Julia: A Fast Dynamic Language for Technical Computing , 2012, ArXiv.

[4]  Stefan Behnel,et al.  Cython: The Best of Both Worlds , 2011, Computing in Science & Engineering.

[5]  Marc Levoy,et al.  The Frankencamera: an experimental platform for computational photography , 2010, SIGGRAPH 2010.

[6]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[7]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Michael A. Shantzis A model for efficient and flexible image computing , 1994, SIGGRAPH.

[9]  V. Sarkar,et al.  Collective Loop Fusion for Array Contraction , 1992, LCPC.

[10]  Jan Kautz,et al.  Local Laplacian filters: edge-aware image processing with a Laplacian pyramid , 2011, SIGGRAPH 2011.

[11]  Volker Strumpen,et al.  Cache oblivious stencil computations , 2005, ICS '05.

[12]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[13]  Jiawen Chen,et al.  Real-time edge-aware image processing with the bilateral grid , 2007, ACM Trans. Graph..

[14]  Mark Harman,et al.  Search-based software engineering , 2001, Inf. Softw. Technol..

[15]  Jan Vitek,et al.  Terra: a multi-stage language for high-performance computing , 2013, PLDI.

[16]  Tomàs Margalef,et al.  Design and implementation of a dynamic tuning environment , 2007, J. Parallel Distributed Comput..

[17]  Paraskevas Evripidou,et al.  Advanced Array Optimizations for High Performance Functional Languages , 1995, IEEE Trans. Parallel Distributed Syst..

[18]  Paul H. J. Kelly,et al.  High-performance SIMT code generation in an active visual effects library , 2009, CF '09.

[19]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[20]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[21]  Westley Weimer,et al.  The road not taken: Estimating path execution frequency statically , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[22]  Marc Levoy,et al.  The Frankencamera: an experimental platform for computational photography , 2010, ACM Trans. Graph..

[23]  Jan Kautz,et al.  Local Laplacian filters: edge-aware image processing with a Laplacian pyramid , 2011, ACM Trans. Graph..

[24]  M. Pharr,et al.  ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).

[25]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[26]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[27]  Michael Salib,et al.  Starkiller: A Static Type Inferencer and Compiler for Python , 2004 .

[28]  José Nelson Amaral,et al.  Compiling Python to a hybrid execution environment , 2010, GPGPU-3.

[29]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[30]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[31]  Frank Pfenning,et al.  Eliminating array bound checking through dependent types , 1998, PLDI.

[32]  Jonathan Ragan-Kelley,et al.  Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..

[33]  George C. Necula,et al.  Dependent Types for Low-Level Programming , 2007, ESOP.

[34]  Conal Elliott,et al.  Functional Image Synthesis , 2001 .

[35]  James R. Larus,et al.  Improving data-flow analysis with path profiles , 1998, PLDI.

[36]  Susan L. Graham,et al.  gprof: a call graph execution profiler (with retrospective) , 1982 .

[37]  Uday Bondhugula,et al.  PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.

[38]  Frank Pfenning,et al.  Dependent types in practical programming , 1999, POPL '99.

[39]  Albert Cohen,et al.  Hybrid Hexagonal/Classical Tiling for GPUs , 2014, CGO '14.

[40]  Mehdi Amini,et al.  Pythran: Enabling Static Optimization of Scientific Python Programs , 2013, SciPy.

[41]  Uday Bondhugula,et al.  Effective automatic parallelization of stencil computations , 2007, PLDI '07.

[42]  Carl Friedrich Bolz,et al.  Tracing the meta-level: PyPy's tracing JIT compiler , 2009, ICOOOLPS@ECOOP.

[43]  AugustssonLennart Cayennea language with dependent types , 1998 .

[44]  Pat Hanrahan,et al.  Riposte: A trace-driven compiler and parallel VM for vector code in R , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[45]  Jiawen Chen,et al.  Real-time edge-aware image processing with the bilateral grid , 2007, SIGGRAPH 2007.

[46]  Bradley C. Kuszmaul,et al.  The pochoir stencil compiler , 2011, SPAA '11.

[47]  S. Crain,et al.  Misleading Health-Related Information Promoted Through Video-Based Social Media: Anorexia on YouTube , 2013, Journal of medical Internet research.

[48]  Thomas Ball,et al.  Edge profiling versus path profiling: the showdown , 1998, POPL '98.

[49]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[50]  Adam Amara,et al.  HOPE: A Python Just-In-Time compiler for astrophysical computations , 2014, Astron. Comput..

[51]  Victor Pankratius,et al.  Run-Time Automatic Performance Tuning for Multicore Applications , 2011, Euro-Par.

[52]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[53]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.