Monitoring and performance measuring distributed systems during operation

This paper describes an integrated tool for monitoring distributed systems continuously during operation. A hybrid monitoring approach is used. As special hardware support a test and measurement processor (TMP) was designed, which is part of each node in an experimental multicomputer system. Each TMP runs local parts of the monitoring software for its node, while all the TMPs are connected to a central test station via a separate TMP interconnection network. The monitoring system is transparent to users. It permanently observes system behavior, measures system performance and records system information. The immense amount of information is graphically displayed in easy-to-read-charts and graphs in an application-oriented manner. The tools promote an improved understanding of run time behavior and performance measurements to derive qualitative and even quantitative assessments about distributed systems. A prototype of the monitoring facility is operational and currently experiments are being conducted in our distributed system consisting of several MC68000 microcomputers.

[1]  Giuseppe Serazzi,et al.  Measurement and Tuning of Computer Systems , 1984, Int. CMG Conference.

[2]  Barton P. Miller,et al.  The Traveling Salesman Problem: The Development of a Distributed Computation , 1986, ICPP.

[3]  William A. Wulf,et al.  HYDRA/C.Mmp, An Experimental Computer System , 1981 .

[4]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[5]  Herb Schwetman,et al.  Monit: a performance monitoring tool for parallel and pseudo-parallel programs , 1987, SIGMETRICS '87.

[6]  Edward F. Gehringer,et al.  The Cm* Testbed , 1982, Computer.

[7]  Jack C. Wileden,et al.  High-level debugging of distributed systems: The behavioral abstraction approach , 1983, J. Syst. Softw..

[8]  Jerome H. Saltzer,et al.  The instrumentation of multics , 1970, CACM.

[9]  Friedemann Mattern,et al.  Experience with a New Distributed Termination Detection Algorithm , 1987, WDAG.

[10]  David R. Cheriton,et al.  The Thoth system : multi-process structuring and portability , 1982 .

[11]  Liba Svobodova Online system performance measurements with software and hybrid monitors , 1973, SOSP '73.

[12]  Domenico Ferrari,et al.  Computer Systems Performance Evaluation , 1978 .

[13]  W. Weigel,et al.  Global events and global breakpoints in distributed systems , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[14]  Liba Svobodova Performance monitoring in computer systems: a structured approach , 1981, OPSR.

[15]  Karl-Heinz John,et al.  Experiences with Performance Measurement and Modeling of a Processor Array , 1983, IEEE Transactions on Computers.

[16]  Barton P Miller Performance Characterization of Distributed Programs , 1984 .

[17]  J. E. Lambert,et al.  Program debugging and performance evaluation aids for a multi-microprocessor development system , 1984, Softw. Microsystems.

[18]  Hector Garcia-Molina,et al.  Debugging a Distributed Computing System , 1984, IEEE Transactions on Software Engineering.

[19]  William R. Franta,et al.  Issues and approaches to distributed testbed instrumentation , 1982, Computer.

[20]  Konrad Slind,et al.  Monitoring distributed systems , 1987, TOCS.

[21]  Richard T. Snodgrass,et al.  Monitoring distributed systems: a relational approach , 1982 .

[22]  M. G. Smith,et al.  A Distributed System Experimentation Facility , 1982, ICDCS.

[23]  David R. Cheriton,et al.  The Thoth System , 1982 .

[24]  Nick Lai The Traveling Salesman Problem: The Development of a Distributed , 1984 .

[25]  Domenico Ferrari Architecture and instrumentation in a modular interactive system , 1973, Computer.

[26]  Larry Rudolph,et al.  PIE: A Programming and Instrumentation Environment for Parallel Processing , 1985, IEEE Software.

[27]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[28]  Larry D. Wittie,et al.  BUGNET: A Debugging system for parallel programming environments , 1982, ICDCS.

[29]  Edward Tucker Smith Debugging techniques for communicating, loosely-coupled processes , 1982 .

[30]  Kenneth W. Kolence,et al.  Software unit profiles & Kiviat figures , 1973, PERV.

[31]  Roger King,et al.  IDD: An Interactive Distributed Debugger , 1985, ICDCS.

[32]  Bernhard Plattner,et al.  Monitoring Program Execution: A Survey. , 1981 .

[33]  Keith A. Lantz,et al.  Rochester's intelligent gateway , 1982, Computer.

[34]  Liba Svobodova Computer System Measurability , 1976, Computer.

[35]  Leigh R. Power,et al.  Design and Use of a Program Execution Analyzer , 1983, IBM Syst. J..

[36]  Friedemann Mattern,et al.  Key Concepts of the INCAS Multicomputer Project , 1987, IEEE Transactions on Software Engineering.

[37]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[38]  Barton P. Miller,et al.  A distributed programs monitor for berkeley UNIX , 1985, Softw. Pract. Exp..

[39]  J. Nievergelt,et al.  Special Feature: Monitoring Program Execution: A Survey , 1981, Computer.

[40]  C. Weitzman Performance Measures for Distributed Computing Systems , 1982, ICDCS.