Parallelization of dynamic languages: synchronizing built-in collections

Dynamic programming languages such as Python and Ruby are widely used, and much effort is spent on making them efficient. One substantial research effort in this direction is the enabling of parallel code execution. While there has been significant progress, making dynamic collections efficient, scalable, and thread-safe is an open issue. Typical programs in dynamic languages use few but versatile collection types. Such collections are an important ingredient of dynamic environments, but are difficult to make safe, efficient, and scalable. In this paper, we propose an approach for efficient and concurrent collections by gradually increasing synchronization levels according to the dynamic needs of each collection instance. Collections reachable only by a single thread have no synchronization, arrays accessed in bounds have minimal synchronization, and for the general case, we adopt the Layout Lock paradigm and extend its design with a lightweight version that fits the setting of dynamic languages. We apply our approach to Ruby's Array and Hash collections. Our experiments show that our approach has no overhead on single-threaded benchmarks, scales linearly for Array and Hash accesses, achieves the same scalability as Fortran and Java for classic parallel algorithms, and scales better than other Ruby implementations on Ruby workloads.

[1]  Jochen Eisinger,et al.  Idle time garbage collection scheduling , 2016, Commun. ACM.

[2]  Josh Juneau,et al.  The Definitive Guide to Jython: Python for the Java Platform , 2010 .

[3]  Carl Friedrich Bolz,et al.  Tracing the meta-level: PyPy's tracing JIT compiler , 2009, ICOOOLPS@ECOOP.

[4]  Jeremy Manson,et al.  JSR-133: Java Memory Model and Thread Specification , 2003 .

[5]  Laurence Tratt,et al.  The impact of meta-tracing on VM design and implementation , 2015, Sci. Comput. Program..

[6]  Hanspeter Mössenböck,et al.  An object storage model for the truffle language implementation framework , 2014, PPPJ '14.

[7]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[8]  Christian Wimmer,et al.  Practical partial evaluation for high-performance dynamic language runtimes , 2017, PLDI.

[9]  Vincent St-Amour,et al.  Optimization Coaching for JavaScript , 2015, ECOOP.

[10]  David Holmes,et al.  Java Concurrency in Practice , 2006 .

[11]  Hanspeter Mössenböck,et al.  Cross-language compiler benchmarking: are we fast yet? , 2016, DLS.

[12]  Erez Petrank,et al.  Layout Lock: A Scalable Locking Paradigm for Concurrent Data Layout Modifications , 2017, PPOPP.

[13]  Peter A. Dinda,et al.  Back to the futures: incremental parallelization of existing sequential runtime systems , 2010, OOPSLA.

[14]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[15]  MeierRemigius,et al.  Virtual machine design for parallel dynamic programming languages , 2018 .

[16]  Mark A. Hillebrand,et al.  Formal Verification of a Reader-Writer Lock Implementation in C , 2009, SSV.

[17]  Peter A. Dinda,et al.  Places: adding message-passing parallelism to racket , 2011, DLS '11.

[18]  Craig Chambers,et al.  An efficient implementation of SELF, a dynamically-typed object-oriented language based on prototypes , 1989, OOPSLA '89.

[19]  Lukasz Ziarek,et al.  Just-In-Time Data Structures , 2015, CIDR.

[20]  Ryan Newton,et al.  Adaptive lock-free maps: purely-functional to scalable , 2015, ICFP.

[21]  José G. Castaños,et al.  Eliminating global interpreter locks in ruby through hardware transactional memory , 2014, PPoPP '14.

[22]  Hannes Payer,et al.  Memento mori: dynamic allocation-site-based optimizations , 2015, ISMM.

[23]  BonettaDaniele,et al.  Efficient and thread-safe objects for dynamically-typed languages , 2016 .

[24]  David Lo,et al.  Empirical Study of Usage and Performance of Java Collections , 2017, ICPE.

[25]  Christian Wimmer,et al.  Self-optimizing AST interpreters , 2012, DLS.

[26]  Clay Andres,et al.  The Definitive Guide To Jython , 2010 .

[27]  James R. Larus,et al.  Transactional Memory, 2nd edition , 2010, Transactional Memory.

[28]  Koushik Sen,et al.  JITProf: pinpointing JIT-unfriendly JavaScript code , 2015, ESEC/SIGSOFT FSE.

[29]  Timothy L. Harris,et al.  A Pragmatic Implementation of Non-blocking Linked-Lists , 2001, DISC.

[30]  Philippas Tsigas,et al.  Lock-Free and Practical Doubly Linked List-Based Deques Using Single-Word Compare-and-Swap , 2004, OPODIS.

[31]  Yi Lin,et al.  Stop and go: understanding yieldpoint behavior , 2015, ISMM.

[32]  Hanspeter Mössenböck,et al.  Techniques and applications for guest-language safepoints , 2015, ICOOOLPS@ECOOP.

[33]  Thomas R. Gross,et al.  Virtual machine design for parallel dynamic programming languages , 2018, Proc. ACM Program. Lang..

[34]  Thomas R. Gross,et al.  Parallel virtual machines with RPython , 2016, DLS.

[35]  Hanspeter Mössenböck,et al.  Partial Escape Analysis and Scalar Replacement for Java , 2014, CGO '14.

[36]  Guoqing Xu,et al.  CoCo: Sound and Adaptive Replacement of Java Collections , 2013, ECOOP.

[37]  Stefan Marr,et al.  Few versatile vs. many specialized collections: how to design a collection library for exploratory programming? , 2018, Programming.

[38]  TrattLaurence,et al.  The impact of meta-tracing on VM design and implementation , 2015 .

[39]  Hanspeter Mössenböck,et al.  Efficient and thread-safe objects for dynamically-typed languages , 2016, OOPSLA.

[40]  Maurice Herlihy,et al.  The Art of Multiprocessor Programming, Revised Reprint , 2012 .

[41]  Laurence Tratt,et al.  Storage strategies for collections in dynamically typed languages , 2013, OOPSLA.

[42]  Ola Bini,et al.  Using JRuby: Bringing Ruby to Java , 2011 .