Leveraging near data processing for high-performance checkpoint/restart
暂无分享,去创建一个
[1] J. Duell. The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .
[2] Chanik Park,et al. Active disk meets flash: a case for intelligent SSDs , 2013, ICS '13.
[3] Steven Swanson,et al. Near-Data Processing: Insights from a MICRO-46 Workshop , 2014, IEEE Micro.
[4] Jung Ho Ahn,et al. Corona: System Implications of Emerging Nanophotonic Technology , 2008, 2008 International Symposium on Computer Architecture.
[5] Christian Engelmann,et al. Combining Partial Redundancy and Checkpointing for HPC , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.
[6] Ron Brightwell,et al. On the Viability of Compression for Reducing the Overheads of Checkpoint/Restart-Based Fault Tolerance , 2012, 2012 41st International Conference on Parallel Processing.
[7] Jinsuk Chung,et al. Containment domains: A scalable, efficient, and flexible resilience scheme for exascale systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Kurt B. Ferreira,et al. A checkpoint compression study for high-performance computing systems , 2015, Int. J. High Perform. Comput. Appl..
[9] Yang Liu,et al. Willow: A User-Programmable SSD , 2014, OSDI.
[10] Mohamed S. Abdelfattah,et al. Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL , 2014, IWOCL '14.
[11] Rolf Riesen,et al. libhashckpt: Hash-Based Incremental Checkpointing Using GPU's , 2011, EuroMPI.
[12] Yong Chen,et al. Towards scalable I/O architecture for exascale systems , 2011, MTAGS '11.
[13] David J. DeWitt,et al. Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.
[14] Chanik Park,et al. Enabling cost-effective data processing with smart SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).
[15] Surendra Byna,et al. Accelerating Science with the NERSC Burst Buffer Early User Program , 2016 .
[16] Bianca Schroeder,et al. Understanding failures in petascale computers , 2007 .
[17] James H. Laros,et al. Redundant computing for exascale systems. , 2010 .
[18] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[19] Yuan Xie,et al. Leveraging 3D PCRAM technologies to reduce checkpoint overhead for future exascale systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[20] Peter Desnoyers,et al. Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines , 2013, FAST.
[21] Laxmikant V. Kalé,et al. FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[22] Andrew Lumsdaine,et al. The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[23] Jason Duell,et al. The design and implementation of Berkeley Lab's linuxcheckpoint/restart , 2005 .
[24] Franck Cappello,et al. Optimization of Multi-level Checkpoint Model for Large Scale HPC Applications , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[25] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[26] Sandia Report,et al. Improving Performance via Mini-applications , 2009 .
[27] Bogdan Nicolae,et al. Towards Scalable Checkpoint Restart: A Collective Inline Memory Contents Deduplication Proposal , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[28] Jinyoung Lee,et al. Biscuit: A Framework for Near-Data Processing of Big Data Workloads , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[29] Dhabaleswar K. Panda,et al. A 1 PB/s file system to checkpoint three million MPI tasks , 2013, HPDC.
[30] André Brinkmann,et al. Deduplication Potential of HPC Applications’ Checkpoints , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).
[31] Sungroh Yoon,et al. Near-Data Processing for Machine Learning , 2016, ArXiv.
[32] Milos Prvulovic,et al. Euripus: A flexible unified hardware memory checkpointing accelerator for bidirectional-debugging and reliability , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[33] Peter Deutsch,et al. DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.
[34] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[35] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..
[36] Jim Rogers. Power Efficiency and Performance with ORNL's Cray XK7 Titan , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[37] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[38] Kurt B. Ferreira,et al. Keeping checkpoint/restart viable for exascale systems , 2011 .
[39] Kurt B. Ferreira,et al. On the Viability of Checkpoint Compression for Extreme Scale Fault Tolerance , 2011, Euro-Par Workshops.
[40] Scott Klasky,et al. Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[41] Luca Benini,et al. Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube , 2016, ARCS.
[42] Dejan S. Milojicic,et al. Optimizing Checkpoints Using NVM as Virtual Memory , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[43] John Bent,et al. PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.