tgp: A Task-Granularity Profiler for the Java Virtual Machine

The analysis of task granularity in parallel applications (i.e., the amount of work to be performed by parallel tasks) is essential to unveil performance problems and to optimize taskparallel applications. Too small task granularities may result in high parallelization overheads, while too large task granularities may indicate missed parallelization opportunities. Despite the importance of task granularity, this metric is not considered by existing profilers for parallel applications on the Java Virtual Machine (JVM). In this paper we present tgp, a novel taskgranularity profiler for multi-threaded applications on the JVM. tgp collects bytecode- and hardware-level metrics to characterize task granularity, assisting the developer in diagnosing and locating parallelization shortcomings.

[1]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[2]  Bradley C. Kuszmaul,et al.  The Cilkprof Scalability Profiler , 2015, SPAA.

[3]  Lieven Eeckhout,et al.  Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[5]  Walter Binder,et al.  ShadowVM: robust and comprehensive dynamic program analysis for the java platform , 2014 .

[6]  Alexandra Fedorova,et al.  Deconstructing the overhead in parallel applications , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[7]  Jan Vitek,et al.  A black-box approach to understanding concurrency in DaCapo , 2012, OOPSLA '12.

[8]  Saturnino Garcia,et al.  Kismet: parallel speedup estimates for serial programs , 2011, OOPSLA '11.

[9]  Julia L. Lawall,et al.  Continuously measuring critical section pressure with the free-lunch profiler , 2014, OOPSLA.

[10]  Walter Binder,et al.  A Portable and Customizable Profiling Framework for Java Based on Bytecode Instruction Counting , 2005, APLAS.

[11]  Stijn Eyerman,et al.  Criticality stacks: identifying critical threads in parallel programs using synchronization behavior , 2013, ISCA.

[12]  Stijn Eyerman,et al.  Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications , 2013, OOPSLA.

[13]  Tingting Yu,et al.  SyncProf: detecting, localizing, and optimizing synchronization bottlenecks , 2016, ISSTA.

[14]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[15]  Walter Binder,et al.  DiSL: a domain-specific language for bytecode instrumentation , 2012, AOSD.

[16]  Saturnino Garcia,et al.  Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.

[17]  J. Morris Chang,et al.  Multithreading in Java: Performance and Scalability on Multicore Systems , 2011, IEEE Transactions on Computers.

[18]  Laurie Hendren,et al.  Dynamic metrics for java , 2003, OOPSLA 2003.

[19]  Yuxiong He,et al.  The Cilkview scalability analyzer , 2010, SPAA '10.

[20]  Walter Binder,et al.  The JVM is not observable enough (and what to do about it) , 2012, VMIL '12.

[21]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org , 2010 .

[22]  Andrea Rosà,et al.  Profiling actor utilization and communication in Akka , 2016, Erlang Workshop.

[23]  Melanie Kambadur,et al.  Harmony: Collection and analysis of parallel block vectors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[24]  Albert Noll,et al.  Online feedback-directed optimizations for parallel Java code , 2013, OOPSLA.

[25]  Santosh Nagarakatte,et al.  A fast causal profiler for task parallel programs , 2017, ESEC/SIGSOFT FSE.

[26]  Walter Binder,et al.  Portable resource control in Java , 2001, OOPSLA '01.

[27]  Efficient Sampling-based Lock Contention Profiling for Java , 2017, ICPE.

[28]  Stijn Eyerman,et al.  Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.