MAGNET: a tool for debugging, analyzing and adapting computing systems

As computing systems grow in complexity, the cluster and grid communities require more sophisticated tools to diagnose, debug and analyze such systems. We have developed a toolkit called MAGNET (Monitoring Apparatus for General kerNel-Event Tracing) that provides a detailed look at operating-system kernel events with very low overhead. Using the fine-grained information that MAGNET exports from kernel space, challenging problems become amenable to identification and correction. In this paper, we first present the design, implementation and evaluation of MAGNET. Then, we show its use as a diagnostic tool, an online-monitoring tool and a tool for building adaptive applications in clusters and grids.

[1]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[2]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[3]  Wu-chun Feng,et al.  Capturing Network Traffic with a MAGNeT , 2001, Annual Linux Showcase & Conference.

[4]  Warren Smith A Framework for Control and Observation in Distributed Environments , 2001 .

[5]  Wu-chun Feng,et al.  MUSE: a software oscilloscope for clusters and grids , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[6]  Jason Lee,et al.  NetLogger: a toolkit for distributed system performance analysis , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[7]  Daniel A. Reed,et al.  The Autopilot Performance-Directed Adaptive Control System , 1997 .

[8]  Wu-chun Feng,et al.  MAGNeT: monitor for application-generated network traffic , 2001, Proceedings Tenth International Conference on Computer Communications and Networks (Cat. No.01EX495).

[9]  Ronald Minnich,et al.  Supermon: a high-speed cluster monitoring system , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[10]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .