Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults
暂无分享,去创建一个
Cosmin Safta | Khachik Sargsyan | Francesco Rizzi | Karla Morris | Paul Mycek | Omar M. Knio | Bert J. Debusschere | Kathryn Dahlgren | Olivier P. Le Maître | O. L. Maître | O. Knio | B. Debusschere | K. Sargsyan | C. Safta | F. Rizzi | Paul Mycek | Karla Morris | K. Dahlgren
[1] Jim Gray,et al. Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.
[2] Daniel P. Siewiorek,et al. Error log analysis: statistical modeling and heuristic trend analysis , 1990 .
[3] An Algebraic Schwarz Theory , 1994 .
[4] Nitin H. Vaidya,et al. A case for two-level distributed recovery schemes , 1995, SIGMETRICS '95/PERFORMANCE '95.
[5] Barry Smith,et al. Domain Decomposition Methods for Partial Differential Equations , 1997 .
[6] D. Keyes. How Scalable is Domain Decomposition in Practice , 1998 .
[7] Michele Benzi,et al. Algebraic theory of multiplicative Schwarz methods , 2001, Numerische Mathematik.
[8] Mark Frederick Hoemmen,et al. An Overview of Trilinos , 2003 .
[9] Archana Ganapathi,et al. Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.
[10] Mark S. Squillante,et al. Failure data analysis of a large-scale heterogeneous server environment , 2004, International Conference on Dependable Systems and Networks, 2004.
[11] Andrea Toselli,et al. Domain decomposition methods : algorithms and theory , 2005 .
[12] Tipp Moseley,et al. Using Process-Level Redundancy to Exploit Multiple Cores for Transient Fault Tolerance , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[13] Sarita V. Adve,et al. Understanding the propagation of hard errors to software and implications for resilient system design , 2008, ASPLOS.
[14] I. Daubechies,et al. Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.
[15] Franck Cappello,et al. Toward Exascale Resilience , 2009, Int. J. High Perform. Comput. Appl..
[16] George Bosilca,et al. Algorithm-based fault tolerance applied to high performance computing , 2009, J. Parallel Distributed Comput..
[17] Bianca Schroeder,et al. A Large-Scale Study of Failures in High-Performance Computing Systems , 2010, IEEE Trans. Dependable Secur. Comput..
[18] Mahmut T. Kandemir,et al. Analyzing the soft error resilience of linear solvers on multicore multiprocessors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[19] Kurt B. Ferreira,et al. Keeping checkpoint/restart viable for exascale systems , 2011 .
[20] Kurt B. Ferreira,et al. Fault-tolerant iterative methods via selective reliability. , 2011 .
[21] Zizhong Chen. Algorithm-based recovery for iterative methods without checkpointing , 2011, HPDC '11.
[22] Hui Liu,et al. Matrix Multiplication on GPUs with On-Line Fault Tolerance , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.
[23] P. Diniz. Exascale Programming Challenges , 2011 .
[24] Jack J. Dongarra,et al. High Performance Dense Linear System Solver with Soft Error Resilience , 2011, 2011 IEEE International Conference on Cluster Computing.
[25] P. Oswald,et al. Greedy and Randomized Versions of the Multiplicative Schwarz Method , 2012 .
[26] Rakesh Kumar,et al. Algorithmic approaches to low overhead fault detection for sparse linear algebra , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[27] Thomas Hérault,et al. Algorithm-based fault tolerance for dense matrix factorizations , 2012, PPoPP '12.
[28] Kurt B. Ferreira,et al. Fault-tolerant linear solvers via selective reliability , 2012, ArXiv.
[29] Dong Li,et al. Classifying soft error vulnerabilities in extreme-Scale scientific applications using a binary instrumentation tool , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[30] Pete Beckman,et al. Introspective Fault Tolerance for Exascale Systems∗ , 2012 .
[31] Nicholas Wilson,et al. Fault-Tolerant Grid-Based Solvers: Combining Concepts from Sparse Grids and MapReduce , 2013, ICCS.
[32] Thomas Hérault,et al. Post-failure recovery of MPI communication capability , 2013, Int. J. High Perform. Comput. Appl..
[33] Christian Engelmann,et al. Toward a Performance/Resilience Tool for Hardware/Software Co-design of High-Performance Computing Systems , 2013, 2013 42nd International Conference on Parallel Processing.
[34] Franck Cappello,et al. Addressing failures in exascale computing , 2014, Int. J. High Perform. Comput. Appl..
[35] Călin Caşcaval,et al. Languages and compilers for parallel computing : 26th International Workshop, LCPC 2013, San Jose, CA, USA, September 25-27, 2013 : revised selected papers , 2014 .
[36] Franck Cappello,et al. Toward Exascale Resilience: 2014 update , 2014, Supercomput. Front. Innov..
[37] Md. Mohsin Ali,et al. Application Level Fault Recovery: Using Fault-Tolerant Open MPI in a PDE Solver , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[38] John Shalf,et al. Abstract Machine Models and Proxy Architectures for Exascale Computing , 2014, 2014 Hardware-Software Co-Design for High Performance Computing.
[39] Joseph P. Kenny,et al. Using Discrete Event Simulation for Programming Model Exploration at Extreme-Scale: Macroscale Components for the Structural Simulation Toolkit (SST) , 2015 .
[40] Cosmin Safta,et al. Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults , 2015, CLUSTER.
[41] Cosmin Safta,et al. Fault Resilient Domain Decomposition Preconditioner for PDEs , 2015, SIAM J. Sci. Comput..