Optimization coaching for fork/join applications on the Java virtual machine

Developing parallel applications that exploit the hardware resources remains challenging. We tackle this issue for fork/join applications running on a single Java Virtual Machine (JVM) in a shared-memory multicore. An optimal fork/join application should maximize parallelism while minimizing overheads, and maximize locality while minimizing contention. Unfortunately, achieving these goals is challenging due to the complexity of tuning fork/join applications. As a result, fork/join applications often suffer from several performance issues such as excessive object creation and reclaiming, suboptimal forking, load imbalance, and inappropriate synchronization. In contrast to the extensive manual experimentation commonly required to properly tune fork/join applications, we devise a coaching tool able to automatically point developers to specific parts of such applications where performance problems originate and suggest concrete code modifications to fix them. Given the increasing popularity of fork/join parallelism on the JVM, many applications can benefit from our approach, including applications using standard Java APIs such as Streams and CompletableFuture.

[1]  Walter Binder,et al.  DiSL: a domain-specific language for bytecode instrumentation , 2012, AOSD.

[2]  Saturnino Garcia,et al.  Kismet: parallel speedup estimates for serial programs , 2011, OOPSLA '11.

[3]  Jeremy Singer,et al.  Comparing Fork / Join and MapReduce , 2012 .

[4]  Mikel Luján,et al.  Towards co-designed optimizations in parallel frameworks: a MapReduce case study , 2016, Conf. Computing Frontiers.

[5]  Doug Lea,et al.  A Java fork/join framework , 2000, JAVA '00.

[6]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[7]  Emery D. Berger,et al.  Coz: finding code that counts with causal profiling , 2015, USENIX Annual Technical Conference.

[8]  Yu Lin,et al.  Study and Refactoring of Android Asynchronous Programming (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  Vincent St-Amour,et al.  Optimization Coaching for JavaScript , 2015, ECOOP.

[10]  Haiyang Sun,et al.  Comprehensive Multiplatform Dynamic Program Analysis for Java and Android , 2016, IEEE Software.

[11]  Rich Hickey,et al.  The Clojure programming language , 2008, DLS '08.

[12]  Petr Tuma,et al.  ShadowVM: robust and comprehensive dynamic program analysis for the java platform , 2014, GPCE '13.

[13]  Stefan Marr,et al.  Fork/join parallelism in the wild: documenting patterns and anti-patterns in Java programs using the fork/join framework , 2014, PPPJ.

[14]  Eduardo Rosales,et al.  Analyzing and optimizing task granularity on the JVM , 2018, CGO.

[15]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[16]  Yu Lin,et al.  Retrofitting concurrency for Android applications through refactoring , 2014, FSE 2014.

[17]  Yu Lin,et al.  CHECK-THEN-ACT Misuse of Java Concurrent Collections , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[18]  P. Sadayappan,et al.  Understanding parallelism-inhibiting dependences in sequential Java programs , 2010, 2010 IEEE International Conference on Software Maintenance.

[19]  Haiyang Sun,et al.  AutoBench: Finding Workloads That You Need Using Pluggable Hybrid Analyses , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[20]  Andreas Zeller,et al.  Profiling Java programs for parallelism , 2009, 2009 ICSE Workshop on Multicore Software Engineering.

[21]  F. Warren Burton,et al.  Executing functional programs on a virtual tree of processors , 1981, FPCA '81.

[22]  Bradley C. Kuszmaul,et al.  The Cilkprof Scalability Profiler , 2015, SPAA.

[23]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[24]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[25]  Eduardo Rosales,et al.  tgp: A Task-Granularity Profiler for the Java Virtual Machine , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[26]  Santosh Nagarakatte,et al.  A fast causal profiler for task parallel programs , 2017, ESEC/SIGSOFT FSE.

[27]  Walter Binder,et al.  The JVM is not observable enough (and what to do about it) , 2012, VMIL '12.

[28]  Sam Tobin-Hochstadt,et al.  Optimization coaching: optimizers learn to communicate with programmers , 2012, OOPSLA '12.

[29]  Peter F. Sweeney,et al.  THOR: A performance analysis tool for Java applications running on multicore systems , 2010, IBM J. Res. Dev..

[30]  Yuxiong He,et al.  The Cilkview scalability analyzer , 2010, SPAA '10.

[31]  Gustavo Pinto,et al.  A large-scale study on the usage of Java's concurrent programming constructs , 2015, J. Syst. Softw..

[32]  Michael D. Ernst,et al.  Refactoring sequential Java code for concurrency via concurrent libraries , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[33]  Doug Lea Concurrent Programming in Java. Second Edition: Design Principles and Patterns , 1999 .

[34]  Matthias Hauswirth,et al.  Vertical profiling: understanding the behavior of object-priented applications , 2004, OOPSLA.

[35]  Saturnino Garcia,et al.  Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.

[36]  Guoqing Xu,et al.  Understanding and overcoming parallelism bottlenecks in ForkJoin applications , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[37]  Martin Odersky,et al.  Scala Actors: Unifying thread-based and event-based programming , 2009, Theor. Comput. Sci..