Deep Start: a hybrid strategy for automated performance problem searches

We present Deep Start, a new algorithm for automated performance diagnosis that uses stack sampling to augment our search-based automated performance diagnosis strategy. Our hybrid approach locates performance problems more quickly and finds problems hidden from a more straightforward search strategy. Deep Start uses stack samples collected as a by-product of normal search instrumentation to find deep starters, functions that are likely to be application bottlenecks. Deep starters are examined early during a search to improve the likelihood of finding performance problems quickly. We implemented the Deep Start algorithm in the Performance Consultant, Paradyn's automated bottleneck detection component. Deep Start found half of our test applications' known bottlenecks 32% to 59% faster than the Performance Consultant's current call graphbased search strategy, and finished finding bottlenecks 10% to 61% faster. In addition to improving search time, Deep Start often found more bottlenecks than the call graph search strategy.

[1]  Graham D. Riley,et al.  FINESSE: a prototype feedback-guided performance enhancement system , 2000, Proceedings 8th Euromicro Workshop on Parallel and Distributed Processing.

[2]  Barton P. Miller,et al.  A Callgraph-Based Search Strategy for Automated Performance Diagnosis (Distinguished Paper) , 2000, Euro-Par.

[3]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[4]  Thomas E. Anderson,et al.  Quartz: a tool for tuning parallel program performance , 1990, SIGMETRICS '90.

[5]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[6]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[7]  B. Miller,et al.  Improving Online Performance Diagnosis by the Use of Historical Performance Data , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[8]  Allen D. Malony,et al.  Capturing and automating performance diagnosis: the Poirot approach , 1995, Proceedings of 9th International Parallel Processing Symposium.

[9]  E AndersonThomas,et al.  Quartz: a tool for tuning parallel program performance , 1990 .

[10]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[11]  SkjellumAnthony,et al.  A high-performance, portable implementation of the MPI message passing interface standard , 1996 .

[12]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[13]  Michael Gerndt,et al.  A Rule-based Approach for Automatic Bottleneck Detection in Programs on Shared , 1997 .

[14]  Barton P. Miller,et al.  Dynamic program instrumentation for scalable performance tools , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[15]  Graham D. Riley,et al.  A Preliminary Evaluation of FINESSE , a Feedback-Guided Performance Enhancement System , 2000, Euro-Par.

[16]  Susan L. Graham,et al.  An execution profiler for modular programs , 1983, Softw. Pract. Exp..

[17]  Barton P. Miller,et al.  A callgraph‐based search strategy for automated performance diagnosis , 2002, Concurr. Comput. Pract. Exp..

[18]  Michael Gerndt,et al.  A rule-based approach for automatic bottleneck detection in programs on shared virtual memory systems , 1997, Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments.