Capturing and automating performance diagnosis: the Poirot approach

Performance diagnosis, the process of finding and explaining performance problems, is an important part of parallel programming. Effective performance diagnosis requires that the programmer plan an appropriate method, and manage the experiments required by that method. This paper presents Poirot, an architecture to support performance diagnosis. It explains how the architecture helps automatically, adaptably plan and manage the diagnosis process. The paper evaluates the generality and practicality of Poirot, by reconstructing diagnosis methods found in several published performance tools.<<ETX>>

[1]  David Maier,et al.  Persistent Object Systems , 1995, Workshops in Computing.

[2]  Stephen Fickas,et al.  Automating the Transformational Development of Software , 1985, IEEE Transactions on Software Engineering.

[3]  Gail E. Kaiser,et al.  Intelligent assistance for software development and maintenance , 1988, IEEE Software.

[4]  A. Malony,et al.  Implementing a parallel C++ runtime system for scalable parallel systems , 1993, Supercomputing '93.

[5]  David S. Wile,et al.  Aggregation, Persistence, and Identity in Worlds , 1989, POS.

[6]  Bernd Mohr Performance Evaluation of Parallel Programs in Parallel and Distributed Systems , 1990, CONPAR.

[7]  Brian C. Williams,et al.  Diagnosing Multiple Faults , 1987, Artif. Intell..

[8]  Todd R. Johnson,et al.  Generic tasks and task structures: history, critique and new directions , 1993 .

[9]  Yun Peng,et al.  A Probabilistic Causal Model for Diagnostic Problem Solving Part II: Diagnostic Strategy , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Sanjay Mittal,et al.  CRSL: A Language for Classificatory Problem Solving and Uncertainty Handling , 1986, AI Mag..

[11]  John Kohn,et al.  ATExpert , 1993, J. Parallel Distributed Comput..

[12]  B. Chandrasekaran,et al.  A Mechanism for Forming Composite Explanatory Hypotheses , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Luc Steels,et al.  Second-Generation Expert Systems , 1985, IEEE Expert.

[14]  John L. Hennessy,et al.  Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications , 1993, IEEE Trans. Parallel Distributed Syst..

[15]  Naftaly H. Minsky,et al.  Configuration management by consensus: an application of law-governed systems , 1990 .

[16]  Barton P. Miller,et al.  Dynamic control of performance monitoring on large scale parallel systems , 1993, ICS '93.

[17]  Karsten Schwan,et al.  ChaosMON—application-specific monitoring and display of performance information for parallel and distributed systems , 1991, PADD '91.

[18]  E HuffKaren,et al.  A plan-based intelligent assistant that supports the software development , 1988 .

[19]  Richard T. Snodgrass,et al.  A relational approach to monitoring complex systems , 1988, TOCS.

[20]  Naftaly H. Minsky,et al.  A software development environment for law-governed systems , 1988, SDE 3.

[21]  R. Eigenmann,et al.  Practical Tools for Optimizing Parallel Programs , 1993 .

[22]  Gail E. Kaiser,et al.  Extending a Tool Integration Language , 1991, Proceedings. First International Conference on the Software Process,.

[23]  Karsten Schwan,et al.  Application-Dependent Dynamic Monitoring of Distributed and Parallel Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[24]  Daniel A. Reed,et al.  Performance Instrumentation Techniques for Parallel Systems , 1993, Performance/SIGMETRICS Tutorials.

[25]  Stephen Fickas,et al.  The Design and an Example Use of Hearsay-III , 1981, IJCAI.

[26]  Thomas E. Anderson,et al.  Quartz: a tool for tuning parallel program performance , 1990, SIGMETRICS '90.

[27]  Mark Crovella,et al.  Performance debugging using parallel performance predicates , 1993, PADD '93.

[28]  Rudolf Eigenmann Toward a methodology of optimizing programs for high-performance computers , 1993, ICS '93.

[29]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..