Applying feedback control to a replica management system

Many modern storage systems used for large-scale scientific systems are multiple use, independently administrated clusters or grids. A common technique to gain storage reliability over a long period of time is the creation of data replicas on multiple servers, but in the presence of server failures, ongoing corrective action must be taken to prevent the loss of high value and low value data. Such a system is difficult to control, and replica management is typically handled in an ad hoc manner. In this work, we claim that repairing prioritized faults is a scheduling problem, founded on the need to minimize a risk-based error function, E. Citing experiments on a prototype replica system for molecular simulations, we apply concepts from control system theory to analyze and handle the application of corrective action

[1]  Douglas Thain,et al.  Separating Abstractions from Resources in a Tactical Storage System , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[2]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[3]  Douglas Thain,et al.  Generosity and gluttony in GEMS: grid enabled molecular simulations , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[4]  Carl Kesselman,et al.  Performance and scalability of a replica location service , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[5]  Francine Berman,et al.  The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[6]  Steven Tuecke,et al.  The Anatomy of the Grid , 2003 .

[7]  Stuart Murdock,et al.  BioSimGrid: towards a worldwide repository for biomolecular simulations. , 2004, Organic & biomolecular chemistry.

[8]  C. Kesselman,et al.  A Metadata Catalog Service for Data Intensive Applications , 2003, ACM/IEEE SC 2003 Conference (SC'03).