Best-Effort Lazy Evaluation for Python Software Built on APIs

This paper focuses on an important optimization opportunity in Python-hosted domain-specific languages (DSLs): the use of laziness for optimization, whereby multiple API calls are deferred and then optimized prior to execution (rather than executing eagerly, which would require executing each call in isolation). In existing supports of lazy evaluation, laziness is “terminated” as soon as control passes back to the host language in any way, limiting opportunities for optimization. This paper presents Cunctator, a framework that extends this laziness to more of the Python language, allowing intermediate values from DSLs like NumPy or Pandas to flow back to the host Python code without triggering evaluation. This exposes more opportunities for optimization and, more generally, allows for larger computation graphs to be built, producing 1.03-14.2X speedups on a set of programs in common libraries and frameworks.

[1]  Ken Kennedy,et al.  Telescoping Languages: A System for Automatic Generation of Domain Languages , 2005, Proceedings of the IEEE.

[2]  Wes McKinney,et al.  pandas: a Foundational Python Library for Data Analysis and Statistics , 2011 .

[3]  Saman P. Amarasinghe,et al.  Weld : A Common Runtime for High Performance Data Analytics , 2016 .

[4]  Thomas Johnsson Efficient compilation of lazy evaluation , 1984, SIGP.

[5]  Sebastian Ullrich,et al.  Counting immutable beans: reference counting optimized for purely functional programming , 2019, IFL.

[6]  John Launchbury,et al.  A natural semantics for lazy evaluation , 1993, POPL '93.

[7]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[8]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[9]  Calvin Lin,et al.  Broadway: A Compiler for Exploiting the Domain-Specific Semantics of Software Libraries , 2005, Proceedings of the IEEE.

[10]  Martin Odersky,et al.  Unifying functional and object-oriented programming with Scala , 2014, Commun. ACM.

[11]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[12]  Fei Wang,et al.  AutoGraph: Imperative-style Coding with Graph-based Performance , 2018, SysML.

[13]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[14]  Peter Henderson,et al.  A lazy evaluator , 1976, POPL.

[15]  Philip Wadler,et al.  Deforestation: Transforming Programs to Eliminate Trees , 1990, Theor. Comput. Sci..

[16]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[17]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[18]  Simon L. Peyton Jones,et al.  A history of Haskell: being lazy with class , 2007, HOPL.

[19]  Edmund M. Clarke,et al.  Design and Synthesis of Synchronization Skeletons Using Branching Time Temporal Logic , 2008, 25 Years of Model Checking.

[20]  Nikolaos S. Papaspyrou,et al.  A type and effect system for implementing functional arrays with destructive updates , 2011, 2011 Federated Conference on Computer Science and Information Systems (FedCSIS).

[21]  Fernando Magno Quintão Pereira,et al.  The Dinamica EGO virtual machine , 2019, Sci. Comput. Program..

[22]  Paul Hudak,et al.  Code optimizations for lazy evaluation , 1988, LISP Symb. Comput..

[23]  Krzysztof Czarnecki,et al.  Generative programming - methods, tools and applications , 2000 .