HPC-Colony: services and interfaces for very large systems
暂无分享,去创建一个
Laxmikant V. Kalé | José E. Moreira | Terry Jones | Sayantan Chakravorty | Celso L. Mendes | Todd Inglett | Andrew Tauferner
[1] Xiaobo Li,et al. On the Communication Complexity of Generalized 2-D Convolution on Array Processors , 1989, IEEE Trans. Computers.
[2] Franco Zambonelli,et al. Diffusive load-balancing policies for dynamic applications , 1999, IEEE Concurr..
[3] Laxmikant V. Kalé,et al. A fault tolerant protocol for massively parallel systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[4] Laxmikant V. Kalé,et al. Adaptive MPI , 2003, LCPC.
[5] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[6] John K. Ousterhout,et al. Scheduling Techniques for Concurrent Systems , 1982, ICDCS.
[7] Scott Pakin,et al. Dynamic Coscheduling on Workstation Clusters , 1998, JSSPP.
[8] Gengbin Zheng,et al. Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing , 2005 .
[9] Laxmikant V. Kalé,et al. NAMD: Biomolecular Simulation on Thousands of Processors , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[10] José E. Moreira,et al. Blue Gene/L programming and operating environment , 2005, IBM J. Res. Dev..
[11] John Paul Shen,et al. Interprocessor Traffic Scheduling Algorithm for Multiple-Processor Networks , 1987, IEEE Transactions on Computers.
[12] Karen D. Devinea,et al. New Challenges in Dynamic Load Balancing , 2004 .
[13] J. Ramanujam,et al. Task allocation onto a hypercube by recursive mincut bipartitioning , 1990, C3P.
[14] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[15] Terry Jones,et al. Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[16] Laxmikant V. Kalé,et al. FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[17] Anthony P. Reeves,et al. Strategies for Dynamic Load Balancing on Highly Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..
[18] Anna Hác,et al. Dynamic Load Balancing in a Distributed System Using a Decentralized Algorithm , 1987, ICDCS.
[19] John K. Ousterhout. Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.
[20] Paul Terry,et al. Improving application performance on HPC systems with process synchronization , 2004 .
[21] Thierry Coupez,et al. Dynamic load-balancing of finite element applications with the DRAMA library , 2000 .
[22] Jack J. Dongarra,et al. Building and Using a Fault-Tolerant MPI Implementation , 2004, Int. J. High Perform. Comput. Appl..
[23] Anthony Skjellum,et al. MPI/FT/sup TM/: architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computing , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.
[24] Roy Friedman,et al. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).
[25] Anand Sivasubramaniam,et al. Critical event prediction for proactive management in large-scale computer clusters , 2003, KDD '03.
[26] Adrianos Lachanas,et al. MPI-FT: Portable Fault Tolerance Scheme for MPI , 2000, Parallel Process. Lett..
[27] B. Bouteiller,et al. MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[28] Robert E. Strom,et al. Optimistic recovery in distributed systems , 1985, TOCS.
[29] Willy Zwaenepoel,et al. Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.
[30] Kai Li,et al. CLIP: A Checkpointing Tool for Message Passing Parallel Programs , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[31] Laxmikant V. Kalé,et al. Topology-aware task mapping for reducing communication contention on large parallel machines , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[32] William H. Cabot,et al. Large-scale simulations with miranda on bluegene/l , 2003 .
[33] Chao Huang. SYSTEM SUPPORT FOR CHECKPOINT AND RESTART OF CHARM++ AND AMPI APPLICATIONS , 2004 .
[34] Laxmikant V. Kalé,et al. A load balancing strategy for prioritized execution of tasks , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.
[35] J. D. Teresco,et al. New challanges in dynamic load balancing , 2005 .
[36] Wesley W. Chu,et al. Task Allocation and Precedence Relations for Distributed Real-Time Systems , 1987, IEEE Transactions on Computers.
[37] Anand Sivasubramaniam,et al. Fault-aware job scheduling for BlueGene/L systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[38] Laxmikant V. Kalé,et al. Proactive Fault Tolerance in MPI Applications Via Task Migration , 2006, HiPC.