Using Magpie for Request Extraction and Workload Modelling

Tools to understand complex system behaviour are essential for many performance analysis and debugging tasks, yet there are many open research problems in their development. Magpie is a toolchain for automatically extracting a system's workload under realistic operating conditions. Using low-overhead instrumentation, we monitor the system to record fine-grained events generated by kernel, middleware and application components. The Magpie request extraction tool uses an application-specific event schema to correlate these events, and hence precisely capture the control flow and resource consumption of each and every request. By removing scheduling artefacts, whilst preserving causal dependencies, we obtain canonical request descriptions from which we can construct concise workload models suitable for performance prediction and change detection. In this paper we describe and evaluate the capability of Magpie to accurately extract requests and construct representative models of system behaviour.

[1]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[2]  Konrad Slind,et al.  Monitoring distributed systems , 1987, TOCS.

[3]  Peter C. Bates,et al.  Debugging heterogeneous distributed systems using event-based models of behavior , 1988, PADD '88.

[4]  J. Larus Whole program paths , 1999, PLDI '99.

[5]  Ehab Al-Shaer,et al.  HiFi: a new monitoring architecture for distributed systems management , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[6]  Michel Dagenais,et al.  Measuring and Characterizing System Behavior Using Kernel-Level Event Logging , 2000, USENIX Annual Technical Conference, General Track.

[7]  Horst Bunke,et al.  Recent developments in graph matching , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[8]  Margo I. Seltzer,et al.  Improving interactive performance using TIPME , 2000, SIGMETRICS '00.

[9]  Fulvio Risso,et al.  An architecture for high performance network analysis , 2001, Proceedings. Sixth IEEE Symposium on Computers and Communications.

[10]  Mike Perkowitz,et al.  Using end-user latency to manage internet infrastructure , 2002, WIESS'02.

[11]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[12]  Christian S. Jensen,et al.  Join operations in temporal databases , 2005, The VLDB Journal.

[13]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[14]  Richard Mortier,et al.  Magpie: Online Modelling and Performance-aware Systems , 2003, HotOS.

[15]  Bryan Cantrill,et al.  Dynamic Instrumentation of Production Systems , 2004, USENIX Annual Technical Conference, General Track.

[16]  Richard Mortier,et al.  Request extraction in Magpie: events, schemas and temporal joins , 2004, EW 11.

[17]  David A. Patterson,et al.  Path-Based Failure and Evolution Management , 2004, NSDI.