Simultaneous Scheduling of Replication and Computation for Bioinformatic Applications on the Grid

One of the first motivations of using grids comes from applications managing large data sets infield such as high energy physics or life sciences. To improve the global throughput of software environments, replicas are usually put at wisely selected sites. Moreover, computation requests have to be scheduled among the available resources. To get the best performance, scheduling and data replication have to be tightly coupled. However, there are few approaches that provide this coupling. This paper presents an algorithm that combines data management and scheduling using a steady-state approach. Our theoretical results are validated using simulation and logs from a large life science application (ACI GRID GriPPS).

[1]  Philip S. Yu,et al.  The state of the art in locally distributed Web-server systems , 2002, CSUR.

[2]  Manish Parashar,et al.  Grid Computing — GRID 2002 , 2002, Lecture Notes in Computer Science.

[3]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[4]  Amos Bairoch,et al.  A Generalized Profile Syntax for Biomolecular Sequence Motifs and its Function in Automatic Sequence Interpretation , 1994, ISMB.

[5]  Cathy H. Wu,et al.  Protein sequence databases. , 2004, Current opinion in chemical biology.

[6]  Guy Perrière,et al.  Integrated databanks access and sequence/structure analysis services at the PBIL , 2003, Nucleic Acids Res..

[7]  Hai Jin,et al.  Special issue on scalable web services and architecture , 2003, J. Parallel Distributed Comput..

[8]  Kavitha Ranganathan,et al.  Decoupling computation and data scheduling in distributed data-intensive applications , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[9]  Shubhashis Sengupta,et al.  Integration of Scheduling and Replication in Data Grids , 2004, HiPC.

[10]  Viktor K. Prasanna,et al.  High Performance Computing - HiPC 2004 , 2004, Lecture Notes in Computer Science.

[11]  László Böszörményi,et al.  A survey of Web cache replacement strategies , 2003, CSUR.

[12]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  Kurt Stockinger,et al.  OptorSim-A Grid Simulator for Studying Dynamic Data Replication Strategies , 2003 .

[15]  Robert S. Ledley,et al.  The Protein Information Resource , 2003, Nucleic Acids Res..

[16]  Hiroshi Nakamura,et al.  Grid as a bioinformatic tool , 2004, Parallel Comput..

[17]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[18]  Dick H. J. Epema,et al.  An evaluation of the close-to-files processor and data co-allocation policy in multiclusters , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[19]  Kavitha Ranganathan,et al.  Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids , 2003, Journal of Grid Computing.

[20]  Miron Livny,et al.  Stork: making data placement a first class citizen in the grid , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[21]  E. Deelman,et al.  Data replication strategies in grid environments , 2002, Fifth International Conference on Algorithms and Architectures for Parallel Processing, 2002. Proceedings..

[22]  Rajkumar Buyya,et al.  Grid Computing — GRID 2000 , 2002, Lecture Notes in Computer Science.

[23]  C Combet,et al.  NPS@: network protein sequence analysis. , 2000, Trends in biochemical sciences.

[24]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[25]  Kurt Stockinger,et al.  Simulation of Dynamic Grid Replication Strategies in OptorSim , 2002, GRID.

[26]  Floriano Zini,et al.  Evaluating scheduling and replica optimisation strategies in OptorSim , 2003, Proceedings. First Latin American Web Congress.

[27]  Francine Berman,et al.  Grid Computing: Making the Global Infrastructure a Reality , 2003 .

[28]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[29]  Kenneth L. Calvert,et al.  Modeling Internet topology , 1997, IEEE Commun. Mag..

[30]  Antoine Vernois,et al.  Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid , 2005, Journal of Grid Computing.

[31]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD): a monitor of genome projects world-wide , 2001, Nucleic Acids Res..