Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization

The performance of R, a popular data analysis language, was never properly understood. Some claimed their R codes ran as efficiently as any native code, others quoted orders of magnitude slowdown of R codes with respect to equivalent C implementations. We found both claims to be true depending on how an R code is written. This paper introduces a first classification of R programming styles into Type I (looping over data), Type II (vector programming), and Type III (glue codes). The most serious overhead of R are mostly manifested on Type I R codes, whereas many Type III R codes can be quite fast. This paper focuses on improving the performance of Type I R codes. We propose the ORBIT VM, an extension of the GNU R VM, to perform aggressive removal of allocated objects and reduction of instruction path lengths in the GNU R VM via profile-driven specialization techniques. The ORBIT VM is fully compatible with the R language and is purely based on interpreted execution. It is a specialization JIT and runtime focusing on data representation specialization and operation specialization. For our benchmarks of Type I R codes, ORBIT is able to achieve an average of 3.5X speedups over the current release of GNU R VM and outperforms most other R optimization projects that are currently available.

[1]  David Gregg,et al.  Dynamic interpretation for dynamic scripting languages , 2010, CGO '10.

[2]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[3]  Brian Hackett,et al.  Fast and precise hybrid type inference for JavaScript , 2012, PLDI '12.

[4]  Christian Wimmer,et al.  One VM to rule them all , 2013, Onward!.

[5]  Péricles Rafael Oliveira Alves,et al.  Just-in-time value specialization , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[6]  Ashlee Vance,et al.  Data Analysts Captivated by R's Power , 2009 .

[7]  Samuele Pedroni,et al.  PyPy's approach to virtual machine construction , 2006, OOPSLA '06.

[8]  Mason Chang,et al.  Trace-based just-in-time type specialization for dynamic languages , 2009, PLDI '09.

[9]  Pat Hanrahan,et al.  Riposte: A trace-driven compiler and parallel VM for vector code in R , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Craig Chambers,et al.  An efficient implementation of SELF, a dynamically-typed object-oriented language based on prototypes , 1989, OOPSLA '89.

[11]  Toshio Nakatani,et al.  On the benefits and pitfalls of extending a statically typed language JIT compiler for dynamic scripting languages , 2012, OOPSLA '12.

[12]  David A. Padua,et al.  MaJIC: compiling MATLAB for speed and responsiveness , 2002, PLDI '02.

[13]  David A. Padua,et al.  Techniques for the translation of MATLAB programs into Fortran 90 , 1999, TOPL.

[14]  Kurt Hornik,et al.  The Comprehensive R Archive Network , 2012 .

[15]  Qi Gao,et al.  The HipHop compiler for PHP , 2012, OOPSLA '12.

[16]  Luke Tierney Compiling R: A Preliminary Report , 2001 .

[17]  Jan Vitek,et al.  Evaluating the Design of the R Language - Objects and Functions for Data Analysis , 2012, ECOOP.