Fault-Tolerant Dynamic Task Graph Scheduling
暂无分享,去创建一个
Sriram Krishnamoorthy | Gagan Agrawal | Kunal Agrawal | Mehmet Can Kurt | S. Krishnamoorthy | G. Agrawal | Kunal Agrawal
[1] Zizhong Chen. Algorithm-based recovery for iterative methods without checkpointing , 2011, HPDC '11.
[2] Esteban Meneses Rojas. Scalable message-logging techniques for effective fault tolerance in hpc applications , 2013 .
[3] Hui Liu,et al. High performance linpack benchmark: a fault tolerant implementation without checkpointing , 2011, ICS '11.
[4] Y.-K. Kwok,et al. Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.
[5] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[6] Sarita V. Adve,et al. Low-cost program-level detectors for reducing silent data corruptions , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[7] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[8] Erik Maehle,et al. Fault-Tolerant Dynamic Task Scheduling Based on Dataflow Graphs , 1998 .
[9] Xiao Qin,et al. A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems , 2006, Parallel Comput..
[10] Hui Liu,et al. Algorithm-Based Recovery for Newton's Method without Checkpointing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[11] Luis Ceze,et al. Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.
[12] Werner Vogels,et al. Dynamo: amazon's highly available key-value store , 2007, SOSP.
[13] Kathryn S. McKinley,et al. Uncertain: a first-order type for uncertain data , 2014, ASPLOS.
[14] Zizhong Chen. Extending algorithm-based fault tolerance to tolerate fail-stop failures in high performance distributed environments , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[15] Eduardo Pinheiro,et al. DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.
[16] Gerhard Fohler. Adaptive fault-tolerance with statically scheduled real-time systems , 1997, Proceedings Ninth Euromicro Workshop on Real Time Systems.
[17] Laxmikant V. Kalé,et al. ACR: Automatic checkpoint/restart for soft and hard error protection , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[18] Jens Palsberg,et al. Concurrent Collections , 2010, Sci. Program..
[19] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[20] Charles E. Leiserson,et al. Executing task graphs using work-stealing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[21] Krithi Ramamritham,et al. Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling , 1997, Proceedings Real-Time Systems Symposium.
[22] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.
[23] Timothy A. Davis,et al. A Concurrent Dynamic Task Graph , 1993, 1993 International Conference on Parallel Processing - ICPP'93.
[24] Heather M. Quinn,et al. Terrestrial-based radiation upsets: a cautionary tale , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).
[25] Barbara Liskov,et al. A design for a fault-tolerant, distributed implementation of Linda , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[26] Scott A. Mahlke,et al. Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.
[27] David Fiala. Detection and correction of silent data corruption for large-scale high-performance computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[28] Martin C. Rinard,et al. Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.
[29] Jacob Nelson,et al. Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[30] C. Greg Plaxton,et al. Thread Scheduling for Multiprogrammed Multiprocessors , 1998, SPAA '98.
[31] Richard D. Schlichting,et al. Supporting Fault-Tolerant Parallel Programming in Linda , 1995, IEEE Trans. Parallel Distributed Syst..
[32] Jacob A. Abraham,et al. Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.
[33] Dong Li,et al. Classifying soft error vulnerabilities in extreme-Scale scientific applications using a binary instrumentation tool , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[34] Yves Robert,et al. Fault tolerant scheduling of precedence task graphs on heterogeneous platforms , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[35] Ali Movaghar-Rahimabadi,et al. A Fault Tolerant Scheduling Algorithm for DAG Applications in Cluster Environments , 2011, ICDIPC.
[36] Kenta Hashimoto. Effective Scheduling of Duplicated Tasks for Fault Tolerance in Multiprocessor Systems , 2002 .
[37] Kaushik Roy,et al. Quality programmable vector processors for approximate computing , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[38] Jinsuk Chung,et al. Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems , 2012, HiPC 2012.
[39] Yves Sorel,et al. An algorithm for automatically obtaining distributed and fault-tolerant static schedules , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..
[40] Weiguo Liu,et al. A Parallel BSP Algorithm for Irregular Dynamic Programming , 2007, APPT.