On Evaluating the Renaissance Benchmarking Suite: Variety, Performance, and Complexity

The recently proposed Renaissance suite is composed of modern, real-world, concurrent, and object-oriented workloads that exercise various concurrency primitives of the JVM. Renaissance was used to compare performance of two stateof-the-art, production-quality JIT compilers (HotSpot C2 and Graal), and to show that the performance differences are more significant than on existing suites such as DaCapo and SPECjvm2008. In this technical report, we give an overview of the experimental setup that we used to assess the variety and complexity of the Renaissance suite, as well as its amenability to new compiler optimizations. We then present the obtained measurements in detail.

[1]  Ondrej Lhoták,et al.  Context transformations for pointer analysis , 2017, PLDI.

[2]  Aleksandar Prokopec,et al.  On Lock-Free Work-stealing Iterators for Parallel Data Structures , 2014 .

[3]  Dan Grossman,et al.  Instrumentation bias for dynamic data race detection , 2017, Proc. ACM Program. Lang..

[4]  Hanspeter Mössenböck,et al.  A Comprehensive Java Benchmark Study on Memory and Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008 , 2017, ICPE.

[5]  Paulo Ferreira,et al.  POLM2: automatic profiling for object lifetime-aware memory management for hotspot big data applications , 2017, Middleware.

[6]  Mira Mezini,et al.  new Scala() instance of Java: a comparison of the memory behaviour of Java and Scala programs , 2012, ISMM '12.

[7]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[8]  Mira Mezini,et al.  Da capo con scala: design and analysis of a scala benchmark suite for the java virtual machine , 2011, OOPSLA '11.

[9]  Hanspeter Mössenböck,et al.  Partial Escape Analysis and Scalar Replacement for Java , 2014, CGO '14.

[10]  Kunle Olukotun,et al.  CCSTM: A Library-Based STM for Scala , 2010 .

[11]  Fengyun Liu,et al.  Theory and Practice of Coroutines with Snapshots , 2018, ECOOP.

[12]  Martin Odersky,et al.  FlowPools: A Lock-Free Deterministic Concurrent Dataflow Abstraction , 2012, LCPC.

[13]  Hanspeter Mössenböck,et al.  An intermediate representation for speculative optimizations in a dynamic compiler , 2013, VMIL '13.

[14]  Jan Vitek,et al.  STMBench7: a benchmark for software transactional memory , 2007, EuroSys '07.

[15]  Aleksandar Prokopec,et al.  Analysis of Concurrent Lock-Free Hash Tries with Constant-Time Operations , 2017, ArXiv.

[16]  Andreas Schörgenhumer,et al.  Efficient Tracing and Versatile Analysis of Lock Contention in Java Applications on the Virtual Machine Level , 2016, ICPE.

[17]  Aleksandar Prokopec,et al.  Cache-tries: concurrent lock-free hash tries with constant-time operations , 2018, PPoPP.

[18]  Kunle Olukotun,et al.  A practical concurrent binary search tree , 2010, PPoPP '10.

[19]  Thomas Würthinger,et al.  An Optimization-Driven Incremental Inline Substitution Algorithm for Just-in-Time Compilers , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[20]  Idit Keidar,et al.  KiWi: A Key-Value Map for Scalable Real-Time Analytics , 2017, PPoPP.

[21]  Hanspeter Mössenböck,et al.  Trace-based Register Allocation in a JIT Compiler , 2016, PPPJ.

[22]  Andrea Rosà,et al.  Renaissance: benchmarking suite for parallel applications on the JVM , 2019, PLDI.

[23]  Aleksandar Prokopec SnapQueue: lock-free queue with constant time snapshots , 2015, Scala@PLDI.

[24]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[25]  Martin Odersky,et al.  Cache-Aware Lock-Free Concurrent Hash Tries , 2017, ArXiv.

[26]  Sevÿc ´ õk Safe Optimisations for Shared-Memory Concurrent Programs , 2011 .

[27]  Efficient Sampling-based Lock Contention Profiling for Java , 2017, ICPE.

[28]  Martin Odersky,et al.  Lock-Free Resizeable Concurrent Tries , 2011, LCPC.

[29]  Martin Odersky,et al.  Conc-Trees for Functional and Parallel Programming , 2015, LCPC.

[30]  Alan L. Cox,et al.  Contention elimination by replication of sequential sections in distributed shared memory programs , 2001, PPoPP '01.

[31]  Lu Fang,et al.  Yak: A High-Performance Big-Data-Friendly Garbage Collector , 2016, OSDI.

[32]  Aleksandar Prokopec,et al.  Pluggable scheduling for the reactor programming model , 2016, AGERE!@SPLASH.

[33]  Fengyun Liu,et al.  On the Soundness of Coroutines with Snapshots , 2018, ArXiv.

[34]  Benjamin Livshits,et al.  JSMeter: Comparing the Behavior of JavaScript Benchmarks with Real Web Applications , 2010, WebApps.

[35]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[36]  Martin Odersky,et al.  Near Optimal Work-Stealing Tree Scheduler for Highly Irregular Data-Parallel Workloads , 2013, LCPC.

[37]  Aleksandar Prokopec,et al.  Accelerating by Idling: How Speculative Delays Improve Performance of Message-Oriented Systems , 2017, Euro-Par.

[38]  Martin Odersky,et al.  Efficient Lock-Free Work-Stealing Iterators for Data-Parallel Collections , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[39]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[40]  Martin Odersky,et al.  Actors That Unify Threads and Events , 2007, COORDINATION.

[41]  Martin Odersky,et al.  A Generic Parallel Collection Framework , 2011, Euro-Par.

[42]  Martin Odersky,et al.  Isolates, channels, and event streams for composable distributed programming , 2015, Onward!.

[43]  Hanspeter Mössenböck,et al.  Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations , 2018, CGO.

[44]  Aleksandar Prokopec Efficient Lock-Free Removing and Compaction for the Cache-Trie Data Structure , 2018, Euro-Par.

[45]  Christopher A. Vick,et al.  The Java HotSpotTM Server Compiler , 2001 .

[46]  Aleksandar Prokopec,et al.  Encoding the building blocks of communication , 2017, Onward!.

[47]  Martin Odersky,et al.  Containers and aggregates, mutators and isolates for reactive programming , 2014, SCALA@ECOOP.

[48]  Benjamin P. Wood,et al.  Lightweight data race detection for production runs , 2017, CC.

[49]  Hanspeter Mössenböck,et al.  User-defined Classification and Multi-level Grouping of Objects in Memory Monitoring , 2018, ICPE.

[50]  Yannis Smaragdakis,et al.  Shooting from the heap: ultra-scalable static analysis with heap snapshots , 2018, ISSTA.

[51]  Nir Shavit,et al.  The SkipTrie: low-depth concurrent search without rebalancing , 2013, PODC '13.

[52]  Martin Odersky,et al.  Concurrent tries with efficient non-blocking snapshots , 2012, PPoPP '12.

[53]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[54]  Jeff Huang,et al.  D4: fast concurrency debugging with parallel differential analysis , 2018, PLDI.

[55]  Thomas Würthinger,et al.  Making collection operations optimal with aggressive JIT compilation , 2017, SCALA@SPLASH.

[56]  Marcos K. Aguilera,et al.  Black-box Concurrent Data Structures for NUMA Architectures , 2017, ASPLOS.