Eliminating global interpreter locks in ruby through hardware transactional memory

Many scripting languages use a Global Interpreter Lock (GIL) to simplify the internal designs of their interpreters, but this kind of lock severely lowers the multi-thread per-formance on multi-core machines. This paper presents our first results eliminating the GIL in Ruby using Hardware Transactional Memory (HTM) in the IBM zEnterprise EC12 and Intel 4th Generation Core processors. Though prior prototypes replaced a GIL with HTM, we tested real-istic programs, the Ruby NAS Parallel Benchmarks (NPB), the WEBrick HTTP server, and Ruby on Rails. We devised a new technique to dynamically adjust the transaction lengths on a per-bytecode basis, so that we can optimize the likelihood of transaction aborts against the relative overhead of the instructions to begin and end the transactions. Our results show that HTM achieved 1.9- to 4.4-fold speedups in the NPB programs over the GIL with 12 threads, and 1.6- and 1.2-fold speedups in WEBrick and Ruby on Rails, respectively. The dynamic transaction-length adjustment chose the best transaction lengths for any number of threads and applications with sufficiently long running times.

[1]  Milo M. K. Martin,et al.  RETCON: transactional repair without replay , 2010, ISCA '10.

[2]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[3]  Toshio Nakatani,et al.  Adding dynamically-typed language support to a statically-typed language compiler: performance evaluation, analysis, and tradeoffs , 2012, VEE '12.

[4]  Chung-Lung Shum IBM zNext - the 3rd generation high frequency microprocessor chip , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).

[5]  Timothy J. Slegel,et al.  Transactional Memory Architecture and Implementation for IBM System Z , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[6]  Jules J. Berman,et al.  Ruby: The Programming Language , 2008 .

[7]  Lutz Prechelt,et al.  An Empirical Comparison of Seven Programming Languages , 2000, Computer.

[8]  Guido van Rossum,et al.  Python Programming Language , 2007, USENIX Annual Technical Conference.

[9]  Marc Tremblay,et al.  Rock: A High-Performance Sparc CMT Processor , 2009, IEEE Micro.

[10]  Michael Gschwind,et al.  The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.

[11]  Maged M. Michael,et al.  Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Ravi Rajwar,et al.  Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[13]  Mark Moir,et al.  Early experience with a commercial hardware transactional memory implementation , 2009, ASPLOS.

[14]  Fuad Tabba Adding concurrency in python using a commercial processor's hardware transactional memory support , 2010, CARN.

[15]  David Flanagan,et al.  The Ruby Programming Language , 2007 .

[16]  Maged M. Michael,et al.  Software Transactional Memory: Why Is It Only a Research Toy? , 2008, ACM Queue.

[17]  Michiaki Tatsubori,et al.  Evaluation of a just-in-time compiler retrofitted for PHP , 2010, VEE '10.

[18]  Craig B. Zilles,et al.  Hardware tansactional memory support for lightweight dynamic language evolution , 2006, OOPSLA '06.

[19]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..