The use of threads is becoming commonplace in both sequential and parallel programs. This paper describes our design and initial experience with non-trace based performance instrumentation techniques for threaded programs. Our goal is to provide detailed performance data while maintaining control of instrumentation costs. We have extended Paradyn's dynamic instrumentation (which can instrument programs without recompiling or relinking) to handle threaded programs.Controlling instrumentation costs means efficient instrumentation code and avoiding locks in the instrumentation. Our design is based on low contention data structures. To associate performance data with individual threads, we have all threads share the same instrumentation code and assign each thread with its own private copy of performance counters or timers. The asynchrony in a threaded program poses a major challenge to dynamic instrumentation. To implement time-based metrics on a per-thread basis, we need to instrument thread context switches, which can cause instrumentation code to interleave. Interleaved instrumentation can not only corrupt performance data, but can also cause a scenario we call self-deadlock where an instrumentation code deadlocks a thread. We introduce thread-conscious locks to avoid self-deadlock, and per-thread virtual CPU timers to reduce the chance of interleaved instrumentation accessing the same performance counter or timer, and to reduce the number of expensive timer calls at thread context switches.Our initial implementation is on SPARC Solaris 2.5 and 2.6 including multiprocessor Sun UltraSPARC Enterprise machines. We tested our tool on large multithreaded applications, including the Java Virtual Machine (JVM). We show how our new techniques helped us to speed up a Java graphics native method by 42% and consequently increase by 24% the amount of work that can be done in unit time in a game applet.
[1]
Oscar Naim,et al.
MDL: a language and compiler for dynamic program instrumentation
,
1997,
Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.
[2]
Barton P. Miller,et al.
The Paradyn Parallel Performance Measurement Tool
,
1995,
Computer.
[3]
Po-Ting Wu,et al.
Multithreaded performance analysis with Sun WorkShop thread event analyzer
,
1998,
SPDT '98.
[4]
Andy Oram,et al.
Programming with GNU software
,
1996
.
[5]
Kai Li,et al.
Performance measurements for multithreaded programs
,
1998,
SIGMETRICS '98/PERFORMANCE '98.
[6]
T. S. West.
New Frontiers
,
1968,
Nature.
[7]
Uresh K. Vahalia.
UNIX Internals: The New Frontiers
,
1995
.
[8]
John Stasko,et al.
Visualizing the Execution of Threads-based Parallel Programs
,
1995
.
[9]
B. Miller,et al.
The Paradyn Parallel Performance Measurement Tools
,
1995
.