Sustainable GPU Computing at Scale
暂无分享,去创建一个
[1] Matteo Frigo,et al. Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.
[2] Boleslaw K. Szymanski,et al. Synchronized Distributed Termination , 1985, IEEE Transactions on Software Engineering.
[3] Yuan Shi,et al. Automatic program parallelization using stateless parallel processing architecture , 2004 .
[4] R. Deal. Simulation Modeling and Analysis (2nd Ed.) , 1994 .
[5] Jack B. Dennis,et al. Data Flow Supercomputers , 1980, Computer.
[6] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[7] Thomas Hérault,et al. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[8] Jerome H. Saltzer,et al. End-to-end arguments in system design , 1984, TOCS.
[9] Jason Duell,et al. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .
[10] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[11] William Gropp,et al. Fault Tolerance in Message Passing Interface Programs , 2004, Int. J. High Perform. Comput. Appl..
[12] Maurice Herlihy,et al. The topological structure of asynchronous computability , 1999, JACM.
[13] Andrew S. Tanenbaum,et al. Distributed systems: Principles and Paradigms , 2001 .
[14] Nicholas Carriero,et al. How to write parallel programs - a first course , 1990 .
[15] Rolf Hempel,et al. The MPI Standard for Message Passing , 1994, HPCN.
[16] Hamid Laga,et al. CUDA (Computer Unified Device Architecture) , 2009 .
[17] Nancy A. Lynch,et al. The impossibility of implementing reliable communication in the face of crashes , 1993, JACM.
[18] Dhabaleswar K. Panda,et al. MVAPICH-Aptus: Scalable high-performance multi-transport MPI over InfiniBand , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[19] Franck Cappello,et al. Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..
[20] Hiroaki Kobayashi,et al. CheCUDA: A Checkpoint/Restart Tool for CUDA Applications , 2009, 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies.
[21] Justin Y. Shi,et al. Decoupling as a Foundation for Large Scale Parallel Computing , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.
[22] Keith W. Ross,et al. Computer networking - a top-down approach featuring the internet , 2000 .
[23] E. N. Elnozahy,et al. Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery , 2004, IEEE Transactions on Dependable and Secure Computing.
[24] Daniel Marques,et al. Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[25] Averill M. Law,et al. Simulation Modeling and Analysis , 1982 .
[26] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.