Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches
暂无分享,去创建一个
[1] Gene Cooperman,et al. DMTCP: Transparent checkpointing for cluster computations and the desktop , 2007, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[2] Chi-Hsiang Yeh. The robust middleware approach for transparent and systematic fault tolerance in parallel and distributed systems , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..
[3] Douglas M. Blough,et al. Distributed diagnosis in dynamic fault environments , 2004, IEEE Transactions on Parallel and Distributed Systems.
[4] Laxmikant V. Kalé,et al. Proactive Fault Tolerance in MPI Applications Via Task Migration , 2006, HiPC.
[5] T. K. Altheide,et al. Comparing the human and chimpanzee genomes: Searching for needles in a haystack , 2005 .
[6] Christian Engelmann,et al. A Framework for Proactive Fault Tolerance , 2008, 2008 Third International Conference on Availability, Reliability and Security.
[7] Andrew Lumsdaine,et al. The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[8] Sy-Yen Kuo,et al. Theoretical Analysis for Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability , 1998, IEEE Trans. Parallel Distributed Syst..
[9] Franck Cappello,et al. Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..
[10] Zizhong Chen,et al. N-Level Diskless Checkpointing , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.
[11] Bianca Schroeder,et al. Understanding failures in petascale computers , 2007 .
[12] Brian K. Shoichet,et al. Computational biology and high performance computing , 1999 .
[13] Rolf Riesen,et al. Fault-tolerance for exascale systems , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).
[14] D. K. Arvind,et al. Languages and Compilers for Parallel Computing , 2014, Lecture Notes in Computer Science.
[15] Hai Jiang,et al. Process/thread migration and checkpointing in heterogeneous distributed systems , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.
[16] George Bosilca,et al. Redesigning the message logging model for high performance , 2010, Concurr. Comput. Pract. Exp..
[17] Y. Li,et al. Current Research and Practice in Proactive Fault Management , 2007 .
[18] Douglas J. Tobias,et al. Vector and parallel algorithms for the molecular dynamics simulation of macromolecules on shared‐memory computers , 1991 .
[19] Baharan Mirzasoleiman,et al. Failure Tolerance of Motif Structure in Biological Networks , 2011, PloS one.
[20] Cho-Li Wang,et al. Scalable group-based checkpoint/restart for large-scale message-passing systems , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[21] M. J. Quinn,et al. Parallel Computing: Theory and Practice , 1994 .
[22] Federico D. Sacerdoti,et al. Scalable Algorithms for Molecular Dynamics Simulations on Commodity Clusters , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[23] Patricia González,et al. Fault-tolerant solutions for a MPI compute intensive application , 2007, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07).
[24] Raimundo José de Araújo Macêdo,et al. An Adaptive Programming Model for Fault-Tolerant Distributed Computing , 2007, IEEE Transactions on Dependable and Secure Computing.
[25] Filip De Turck,et al. Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids , 2009, IEEE Transactions on Parallel and Distributed Systems.
[26] Rajkumar Buyya,et al. High Performance Cluster Computing: Programming and Applications , 1999 .
[27] Jason Duell,et al. The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..
[28] Ramón Díaz-Uriarte,et al. ADaCGH: A Parallelized Web-Based Application and R Package for the Analysis of aCGH Data , 2007, PloS one.
[29] M. Schatz,et al. Searching for SNPs with cloud computing , 2009, Genome Biology.
[30] Meeta Sharma Gupta,et al. Performance implications of periodic checkpointing on large-scale cluster systems , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[31] John Paul Walters,et al. Replication-Based Fault Tolerance for MPI Applications , 2009, IEEE Transactions on Parallel and Distributed Systems.
[32] Daniel Okunbor,et al. Efficient parallel algorithms for molecular dynamics simulations , 1999, Parallel Comput..
[33] Jose Renato Santos,et al. Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Operating Systems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[34] Steve Plimpton,et al. Fast parallel algorithms for short-range molecular dynamics , 1993 .
[35] Qi Sun,et al. BioHPC: Computational Biology Application Suite for High Performance Computing , 2010 .
[36] Zizhong Chen,et al. Algorithm-Based Fault Tolerance for Fail-Stop Failures , 2008, IEEE Transactions on Parallel and Distributed Systems.
[37] Israel Koren,et al. Fault-Tolerant Systems , 2007 .
[38] Zhiling Lan,et al. Adaptive Fault Management of Parallel Applications for High-Performance Computing , 2008, IEEE Transactions on Computers.
[39] William H. Sanders,et al. An Adaptive Algorithm for Tolerating Value Faults and Crash Failures , 2001, IEEE Trans. Parallel Distributed Syst..
[40] Christian Engelmann,et al. Proactive Fault Tolerance Using Preemptive Migration , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[41] Michael Wooldridge,et al. An Introduction to MultiAgent Systems, Second Edition , 2009 .
[42] P. Zipperlen,et al. Functional genomic analysis of C. elegans chromosome I by systematic RNA interference , 2000, Nature.
[43] Zizhong Chen,et al. Process Fault Tolerance: Semantics, Design and Applications for High Performance Computing , 2005, Int. J. High Perform. Comput. Appl..
[44] Rajkumar Buyya,et al. High Performance Cluster Computing , 1999 .
[45] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[46] Axel W. Krings,et al. Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing , 2009, IEEE Transactions on Dependable and Secure Computing.
[47] Xuejun Yang,et al. FTPA: Supporting Fault-Tolerant Parallel Computing through Parallel Recomputing , 2009, IEEE Transactions on Parallel and Distributed Systems.
[48] Michael Wooldridge,et al. Introduction to multiagent systems , 2001 .