A component architecture for the message passing interface (mpi): the systems services interface (ssi) of lam/mpi
暂无分享,去创建一个
[1] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[2] Bjarne Stroustrup,et al. C++ Programming Language , 1986, IEEE Softw..
[3] William Gropp,et al. Dynamic process management in an MPI setting , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.
[4] Mark J. Clement,et al. Core Algorithms of the Maui Scheduler , 2001, JSSPP.
[5] Hua Zhong,et al. CRAK: Linux Checkpoint/Restart As a Kernel Module , 1996 .
[6] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.
[7] Jeffrey F. Naughton,et al. Real-time, concurrent checkpoint for parallel programs , 1990, PPOPP '90.
[8] Samuel Webb Williams,et al. The Component Object Model: A Technical Overview , 1994 .
[9] Anthony Skjellum,et al. MPI/FT/sup TM/: architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computing , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.
[10] Roy Friedman,et al. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).
[11] Rajeev Thakur,et al. On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.
[12] Leonid Oliker,et al. System Utilization Benchmark on the Cray T3E and IBM SP , 2000, JSSPP.
[13] Bharat K. Bhargava,et al. Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.
[14] R. Thakur,et al. Improving the Performance of MPI Collective Communication on Switched Networks , 2003 .
[15] Robert B. Ross,et al. Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.
[16] Henri E. Bal,et al. MPI's Reduction Operations in Clustered Wide Area Systems. , 1999 .
[17] Brian Barrett,et al. Boot System Services Interface (SSI) Modules for LAM/MPI API Version 1.0.0 / SSI Version 1.0.0 , 2003 .
[18] Brian Randell. System Structure for Software Fault Tolerance , 1975, IEEE Trans. Software Eng..
[19] Sheng Liang,et al. Dynamic class loading in the Java virtual machine , 1998, OOPSLA '98.
[20] William R. Dieter,et al. A user-level checkpointing library for POSIX threads programs , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[21] Henri E. Bal,et al. Bandwidth-efficient collective communication for clustered wide area systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[22] Flaviu Cristian,et al. Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement , 1995, Inf. Comput..
[23] David L. Russell,et al. State Restoration in Systems of Communicating Processes , 1980, IEEE Transactions on Software Engineering.
[24] Nancy A. Lynch,et al. Impossibility of distributed consensus with one faulty process , 1983, PODS '83.
[25] Jack Dongarra,et al. Fault Tolerant Communication Library and Applications for High Performance Computing , 2003 .
[26] Rajeev Thakur,et al. Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.
[27] Corporate The MPI Forum,et al. MPI: a message passing interface , 1993, Supercomputing '93.
[28] Jason Duell,et al. The design and implementation of Berkeley Lab's linuxcheckpoint/restart , 2005 .
[29] Qing Huang,et al. A Comparison of MPICH Allgather Algorithms on Switched Networks , 2003, PVM/MPI.
[30] Al Stevens,et al. C programming , 1990 .
[31] Jack Dongarra,et al. Integrated Pvm Framework Supports Heterogeneous Network Computing , 1993 .
[32] Andrew Lumsdaine,et al. A Component Architecture for LAM/MPI , 2003, PVM/MPI.
[33] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[34] Erik A. Hendriks,et al. BProc: the Beowulf distributed process space , 2002, ICS '02.
[35] Qianfeng Zhang. MPI collective operations over Myrinet , 2002 .
[36] Anthony Skjellum,et al. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..
[37] Jian Xu,et al. Necessary and Sufficient Conditions for Consistent Global Snapshots , 1995, IEEE Trans. Parallel Distributed Syst..
[38] J. Duell. The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .
[39] Harrick M. Vin,et al. Egida: an extensible toolkit for low-overhead fault-tolerance , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[40] William Gropp. The MPI-2 extensions , 1998 .
[41] William Gropp,et al. MPI-2: Extending the Message-Passing Interface , 1996, Euro-Par, Vol. I.
[42] Ian T. Foster,et al. A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[43] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[44] David R. Butenhof. Programming with POSIX threads , 1993 .
[45] Zhou Lei,et al. The portable batch scheduler and the maui scheduler on linux clusters , 2000 .
[46] Laxmikant V. Kale,et al. A tutorial introduction to charm , 1992 .
[47] CORPORATE Computer Science and Telecommunications Board,et al. Academic careers for experimental computer scientists and engineers , 1994, CACM.
[48] David Chappell,et al. Understanding ActiveX and OLE: a guide for developers and managers , 1996 .
[49] David F. Heidel,et al. An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[50] Jason Duell,et al. The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..
[51] Nancy A. Lynch,et al. Impossibility of distributed consensus with one faulty process , 1985, JACM.
[52] Yi-Min Wang,et al. Checkpointing and its applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[53] Jyh-Jong Tsay,et al. Checkpointing Message-Passing Interface (MPI) parallel programs , 1997, Proceedings Pacific Rim International Symposium on Fault-Tolerant Systems.
[54] Brian W. Barrett,et al. The system services interface (SSI) to LAM/MPI , 2003 .
[55] Willy Zwaenepoel,et al. The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[56] Marc Snir,et al. The Communication Software and Parallel Environment of the IBM SP2 , 1995, IBM Syst. J..
[57] Michel Raynal,et al. Consistent Checkpointing in Message Passing Distributed Systems , 1995 .
[58] James Arthur Kohl,et al. HARNESS: a next generation distributed virtual machine , 1999, Future Gener. Comput. Syst..
[59] Von Welch,et al. Fine-Grain Authorization Policies in the GRID: Design and Implementation , 2003, Middleware Workshops.
[60] Miron Livny,et al. Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.
[61] Ronald Minnich,et al. A Network-Failure-Tolerant Message-Passing System for Terascale Clusters , 2002, ICS '02.
[62] Péter Urbán,et al. Chasing the FLP impossibility result in a LAN: or, How robust can a fault tolerant server be? , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.
[63] Dhabaleswar K. Panda,et al. Efficient collective operations using remote memory operations on VIA-based clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[64] Kai Li,et al. CLIP: A Checkpointing Tool for Message Passing Parallel Programs , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[65] Ian T. Foster,et al. Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..
[66] P. Merkey,et al. Beowulf: harnessing the power of parallelism in a pile-of-PCs , 1997, 1997 IEEE Aerospace Conference.
[67] Marvin Solomon,et al. The evolution of Condor checkpointing , 1999 .
[68] Taesoon Park,et al. Checkpointing and rollback-recovery in distributed systems , 1989 .
[69] Greg Burns,et al. LAM: An Open Cluster Environment for MPI , 2002 .
[70] Edward W. Felten,et al. Improving the performance of message-passing applications by multithreading , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..
[71] Philip M. Papadopoulos,et al. NPACI: rocks: tools and techniques for easily deploying manageable Linux clusters , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.
[72] Mark A. Taylor,et al. Architecture of LA-MPI, a network-fault-tolerant MPI , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[73] Ron Brightwell,et al. The Portals 3.0 Message Passing Interface Revision 1.0 , 1999 .
[74] Bronis R. de Supinski,et al. Exploiting hierarchy in parallel computer networks to optimize collective operation performance , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.
[75] Flaviu Cristian,et al. Reaching agreement on processor-group membrship in synchronous distributed systems , 1991, Distributed Computing.
[76] Henri E. Bal,et al. MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.
[77] Thomas Hérault,et al. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[78] Sergei Gorlatch,et al. Send-receive considered harmful: Myths and realities of message passing , 2004, TOPL.
[79] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[80] Michael Franz. Dynamic Linking of Software Components , 1997, Computer.
[81] Augusto Ciuffoletti,et al. A Distributed Domino-Effect free recovery Algorithm , 1984, Symposium on Reliability in Distributed Software and Database Systems.
[82] Xin Yuan,et al. CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters , 2003, PPoPP '03.
[83] Jack J. Dongarra,et al. HARNESS and fault tolerant MPI , 2001, Parallel Comput..
[84] Clemens A. Szyperski,et al. Component software - beyond object-oriented programming , 2002 .
[85] Jack Dongarra,et al. PVM: Experiences, current status and future direction , 1993, Supercomputing '93. Proceedings.
[86] William Gropp,et al. Mpi - The Complete Reference: Volume 2, the Mpi Extensions , 1998 .
[87] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[88] Brian W. Barrett,et al. Request progression interface (RPI) system services interface (SSI) modules for LAM/MPI , 2003 .
[89] Ian T. Foster,et al. The Globus project: a status report , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).
[90] Richard Y. Kain,et al. Rollback Recovery in Distributed Systems Using Loosely Synchronized Clocks , 1992, IEEE Trans. Parallel Distributed Syst..
[91] Geoffrey James. The Tao of Programming , 1987 .
[92] Rajeev Thakur,et al. Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.
[93] Yuval Tamir,et al. ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS , 1984 .
[94] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[95] W. Kent Fuchs,et al. Checkpoint Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems , 1995, IEEE Trans. Parallel Distributed Syst..
[96] Jack J. Dongarra,et al. Visualization and debugging in a heterogeneous environment , 1993, Computer.
[97] Miron Livny,et al. Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .
[98] Scott R. Kohn,et al. Toward a Common Component Architecture for High-Performance Scientific Computing , 1999, HPDC.
[99] Indranil Gupta,et al. On scalable and efficient distributed failure detectors , 2001, PODC '01.
[100] Sam Toueg,et al. The weakest failure detector for solving consensus , 1992, PODC '92.