DGMonitor: A Performance Monitoring Tool for Sandbox-Based Desktop Grid Platforms

Summary form only given. Accurate, continuous resource monitoring and profiling are critical for enabling performance tuning and scheduling optimization. In desktop grid systems that employ sandboxing, these issues are challenging because (1) subjobs inside sandboxes are executed in a virtual computing environment and (2) the state of the virtual computing environment within the sandboxes is reset to empty after each subjob completes. DGMonitor is a monitoring tool, which builds a global, accurate, and continuous view of real resource utilization for desktop grids with sandboxing. Our monitoring tool measures performance unobtrusively and reliably, uses a simple performance data model, and is easy to use. Our measurements demonstrate that DGMonitor can scale to large desktop grids (up to 12000 workers) with low monitoring overhead in terms of resource consumption (less than 0.1%) on desktop PCs. Though we developed DGMonitor with the Entropia DCGrid platform, our tool is easily integrated into other desktop grid systems. In all of these systems, DGMonitor data can support existing and novel information services, particularly for performance tuning and scheduling.

[1]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[2]  Dong Lu,et al.  Nondeterministic Queries in a Relational Grid Information Service , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[3]  Christian Poellabauer,et al.  Resource-aware stream management with the customizable dproc distributed monitoring mechanisms , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[4]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[5]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[6]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[7]  Michela Taufer,et al.  Inverting middleware: performance analysis of layered application codes in high performance distributed computing , 2002 .

[8]  Flaviu Cristian,et al.  The Timed Asynchronous Distributed System Model , 1999, IEEE Trans. Parallel Distributed Syst..

[9]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[10]  J. Skolnick,et al.  Assembly of protein structure from sparse experimental data: An efficient Monte Carlo model , 1998, Proteins.

[11]  Ying Zhang,et al.  SvPablo: A Multi-language Performance Analysis System , 1998, Computer Performance Evaluation.

[12]  Gilles Fedak,et al.  XtremWeb & Condor : sharing resources between Internet connected Condor pool , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[13]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[14]  Jeff Dike,et al.  A user-mode port of the Linux kernel , 2000, Annual Linux Showcase & Conference.

[15]  Virgílio A. F. Almeida Capacity Planning for Web Services , 2002, Performance.

[16]  Charles L. Brooks,et al.  Predictor@Home: a "protein structure prediction supercomputer" based on public-resource computing , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[17]  Gilles Fedak,et al.  XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[18]  Andrew A. Chien,et al.  Entropia: architecture and performance of an enterprise desktop grid system , 2003, J. Parallel Distributed Comput..

[19]  Fangzhe Chang,et al.  User-level resource-constrained sandboxing , 2000 .

[20]  Andrew A. Chien,et al.  Henri Casanova , 2022 .

[21]  Ju Wang,et al.  The entropia virtual machine for desktop grids , 2005, VEE '05.

[22]  Zvi M. Kedem,et al.  Metacomputing and Resource Allocation on the World Wide Web , 1998 .

[23]  Steve Fisher Relational model for information and monitoring , 2001 .