A Runtime Heuristic to Selectively Replicate Tasks for Application-Specific Reliability Targets
暂无分享,去创建一个
Omer Subasi | Osman S. Unsal | Jesús Labarta | Gulay Yalcin | Ferad Zyulkyarov | Jesús Labarta | O. Unsal | Ferad Zyulkyarov | Omer Subasi | Gulay Yalcin
[1] Franck Cappello,et al. Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..
[2] Robert S. Swarz,et al. Reliable Computer Systems: Design and Evaluation , 1992 .
[3] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[4] Paolo Toth,et al. Knapsack Problems: Algorithms and Computer Implementations , 1990 .
[5] Polyvios Pratikakis,et al. BDDT:: block-level dynamic dependence analysisfor deterministic task-based parallelism , 2012, PPoPP '12.
[6] S. E. Michalak,et al. Assessment of the Impact of Cosmic-Ray-Induced Neutrons on Hardware in the Roadrunner Supercomputer , 2012, IEEE Transactions on Device and Materials Reliability.
[7] Jie Liu,et al. Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[8] Barbara M. Chapman,et al. A Prototype Implementation of OpenMP Task Dependency Support , 2013, IWOMP.
[9] Chi Ching Chi,et al. A Benchmark Suite for Evaluating Parallel Programming Models: Introduction and Preliminary Results , 2011 .
[10] D. DeMets,et al. Data integrity. , 2020, Controlled clinical trials.
[11] Chi Ching Chi,et al. A Benchmark Suite for Evaluating Parallel Programming Models , 2011 .
[12] Melvin E. Conway,et al. A multiprocessor system design , 1899, AFIPS '63 (Fall).
[13] Martin Schulz,et al. IPAS: Intelligent protection against silent output corruption in scientific applications , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[14] David A. Wood,et al. ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[15] Satoshi Matsuoka,et al. Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM , 2013, ISC.
[16] Joel S. Emer,et al. Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[17] Franck Cappello,et al. Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).
[18] Bo Fang,et al. Evaluating the Error Resilience of Parallel Programs , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[19] David Fiala. Detection and correction of silent data corruption for large-scale high-performance computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[21] Franck Cappello,et al. Fault-Tolerant Protocol for Hybrid Task-Parallel Message-Passing Applications , 2015, 2015 IEEE International Conference on Cluster Computing.
[22] Eduard Ayguadé,et al. Programmability and portability for exascale: Top down programming methodology and tools with StarSs , 2013, J. Comput. Sci..
[23] Ben H. H. Juurlink,et al. Using OpenMP superscalar for parallelization of embedded and consumer applications , 2012, 2012 International Conference on Embedded Computer Systems (SAMOS).
[24] Franck Cappello,et al. FTI: High performance Fault Tolerance Interface for hybrid systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[25] Eduard Ayguadé,et al. PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite , 2016, ACM Trans. Archit. Code Optim..
[26] Mikko H. Lipasti,et al. Silent stores for free , 2000, MICRO 33.
[27] Osman S. Unsal,et al. NanoCheckpoints: A Task-Based Asynchronous Dataflow Framework for Efficient and Scalable Checkpoint/Restart , 2015, 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[28] Omer Subasi,et al. Programmer-directed partial redundancy for resilient HPC , 2015, Conf. Computing Frontiers.
[29] Franck Cappello,et al. Toward Exascale Resilience , 2009, Int. J. High Perform. Comput. Appl..
[30] Carol Lochbaum,et al. A block diagram compiler , 1961 .
[31] Eduard Ayguadé,et al. Overlapping communication and computation by using a hybrid MPI/SMPSs approach , 2010, ICS '10.
[32] Pascal Felber,et al. Adaptive Selective Replication for Complex Event Processing Systems , 2013, BD3@VLDB.
[33] Alejandro Duran,et al. Support for OpenMP tasks in Nanos v4 , 2007, CASCON.
[34] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[35] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..
[36] Vilas Sridharan,et al. A study of DRAM failures in the field , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[37] Amin Ansari,et al. Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS XV.
[38] Mehdi Baradaran Tahoori,et al. A layout-based approach for Multiple Event Transient analysis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).