PLFS: a checkpoint filesystem for parallel applications
暂无分享,去创建一个
John Bent | Garth A. Gibson | Gary Grider | Milo Polte | Meghan Wingate | Ben McClelland | Paul Nowoczynski | James Nunez | G. Grider | J. Bent | M. Polte | Ben McClelland | P. Nowoczynski | J. Nunez | M. Wingate | John Bent
[1] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.
[2] Mendel Rosenblum,et al. The design and implementation of a log-structured file system , 1991, SOSP '91.
[3] James Lau,et al. File System Design for an NFS File Server Appliance , 1994, USENIX Winter.
[4] Jeffrey F. Naughton,et al. Low-Latency, Concurrent Checkpointing for Parallel Programs , 1994, IEEE Trans. Parallel Distributed Syst..
[5] Jim Zelenka,et al. The Scotch parallel storage systems , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.
[6] Miron Livny,et al. Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .
[7] Kai Li,et al. Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..
[8] Rajeev Thakur,et al. Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.
[9] Kai Li,et al. Memory Exclusion: Optimizing the Performance of Checkpointing Systems , 1999, Softw. Pract. Exp..
[10] Douglas Thain,et al. Bypass: a tool for building split execution systems , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.
[11] Erez Zadok,et al. FIST: a language for stackable file systems , 2000, OPSR.
[12] Andrea C. Arpaci-Dusseau,et al. Implicit coscheduling: coordinated scheduling with implicit information in distributed systems , 2001, TOCS.
[13] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .
[14] Jianwei Li,et al. Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[15] Remzi H. Arpaci-Dusseau,et al. Run-time adaptation in river , 2003, TOCS.
[16] E. N. Elnozahy,et al. Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery , 2004, IEEE Transactions on Dependable and Secure Computing.
[17] Brent Welch,et al. Managing Scalability in Object Storage Systems for HPC Linux Clusters , 2004, MSST.
[18] Tyce T. McLarty,et al. Parallel file system testing for the lunatic fringe: the care and feeding of restless I/O power users , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).
[19] Jason Duell,et al. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .
[20] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[21] Samuel Lang,et al. GIGA+: scalable directories for shared file systems , 2007, PDSW '07.
[22] Eduardo Pinheiro,et al. Failure Trends in a Large Disk Drive Population , 2007, FAST.
[23] Bianca Schroeder,et al. Understanding failures in petascale computers , 2007 .
[24] Jeffrey S. Vetter,et al. Exploiting Lustre File Joining for Effective Collective IO , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).
[25] M. Polte,et al. Fast log-based concurrent writing of checkpoints , 2008, 2008 3rd Petascale Data Storage Workshop.
[26] Sudharshan S. Vazhkudai,et al. Aggregate Memory as an Intermediate Checkpoint Storage Device , 2008 .
[27] P. Nowoczynski,et al. Zest Checkpoint storage system for large supercomputers , 2008, 2008 3rd Petascale Data Storage Workshop.
[28] Karsten Schwan,et al. Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.
[29] Bin Zhou,et al. Scalable Performance of the Panasas Parallel File System , 2008, FAST.
[30] Matei Ripeanu,et al. stdchk: A Checkpoint Storage System for Desktop Grid Computing , 2007, 2008 The 28th International Conference on Distributed Computing Systems.
[31] Robert B. Ross,et al. Coordinating government funding of file system and I/O research through the high end computing university research activity , 2009, OPSR.
[32] Bianca Schroeder,et al. A Large-Scale Study of Failures in High-Performance Computing Systems , 2010, IEEE Trans. Dependable Secur. Comput..