CSR: Core Surprise Removal in Commodity Operating Systems
暂无分享,去创建一个
[1] Theo Ungerer,et al. Fault detection and tolerance mechanisms for future 1000 core systems , 2013, 2013 International Conference on High Performance Computing & Simulation (HPCS).
[2] S. Borkar,et al. An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS , 2008, IEEE Journal of Solid-State Circuits.
[3] Shlomi Dolev,et al. Towards Self-Stabilizing Operating Systems , 2008, IEEE Transactions on Software Engineering.
[4] Paolo Faraboschi,et al. COTSon: infrastructure for full system simulation , 2009, OPSR.
[5] Brian N. Bershad,et al. Recovering device drivers , 2004, TOCS.
[6] Torvald Riegel,et al. Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack , 2010, EuroSys '10.
[7] Feng Zhao,et al. Energy aware consolidation for cloud computing , 2008, CLUSTER 2008.
[8] Hermann Härtig,et al. Who Watches the Watchmen? Protecting Operating System Reliability Mechanisms , 2012, HotDep.
[9] Theo Ungerer,et al. Impact of Message Based Fault Detectors on Applications Messages in a Network on Chip , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.
[10] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[11] Cristian Constantinescu,et al. Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.
[12] Anoop Gupta,et al. Hive: fault containment for shared-memory multiprocessors , 1995, SOSP.
[13] Cecilia Metra,et al. Error correcting code analysis for cache memory high reliability and performance , 2011, 2011 Design, Automation & Test in Europe.
[14] Li Zhao,et al. VM3: Measuring, modeling and managing VM shared resources , 2009, Comput. Networks.
[15] Adrian Schüpbach,et al. The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.
[16] John R. Douceur,et al. Cycles, cells and platters: an empirical analysisof hardware failures on a million consumer PCs , 2011, EuroSys '11.
[17] Theo Ungerer,et al. Fault Localization in NoCs Exploiting Periodic Heartbeat Messages in a Many-Core Environment , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[18] George Candea,et al. Microreboot - A Technique for Cheap Recovery , 2004, OSDI.
[19] Axel Jantsch,et al. Methods for fault tolerance in networks-on-chip , 2013, CSUR.
[20] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.
[21] Maged M. Michael,et al. Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[22] Osman S. Unsal,et al. FaulTM: Error detection and recovery using Hardware Transactional Memory , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[23] George C. Necula,et al. SafeDrive: safe and recoverable extensions using language-based techniques , 2006, OSDI '06.
[24] Fabrice Bellard,et al. QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.
[25] Wolfgang Mauerer,et al. Professional Linux Kernel Architecture , 2008 .
[26] Avi Mendelson,et al. Architectural Support for Fault Tolerance in a Teradevice Dataflow System , 2014, International Journal of Parallel Programming.
[27] Brian N. Bershad,et al. Improving the reliability of commodity operating systems , 2005, TOCS.
[28] Paul E. McKenney,et al. Cleaning up Linux's CPU hotplug for real time and energy management , 2012, SIGBED.
[29] Avi Mendelson,et al. TERAFLUX: Harnessing dataflow in next generation teradevices , 2014, Microprocess. Microsystems.
[30] Robert Morris,et al. Optimizing MapReduce for Multicore Architectures , 2010 .
[31] Herbert Bos,et al. MINIX 3: a highly reliable, self-repairing operating system , 2006, OPSR.
[32] Michael M. Swift,et al. Chameleon: operating system support for dynamic processors , 2012, ASPLOS XVII.
[33] Guu-Chang Yang. Reliability of semiconductor RAMs with soft-error scrubbing techniques , 1995 .
[34] Richard D. Schlichting,et al. Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.
[35] Samuel T. King,et al. Recovery domains: an organizing principle for recoverable operating systems , 2009, ASPLOS.
[36] Jie Liu,et al. Algorithm Design for Performance Aware VM Consolidation , 2013 .
[37] Ravi Rajwar,et al. Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[38] Andrzej Kochut,et al. Dynamic Placement of Virtual Machines for Managing SLA Violations , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.
[39] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[40] Anand Sivasubramaniam,et al. Fault-aware job scheduling for BlueGene/L systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[41] Jeffrey Katcher,et al. PostMark: A New File System Benchmark , 1997 .
[42] Howard Gobioff,et al. The Google file system , 2003, SOSP '03.
[43] Sarita V. Adve,et al. The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.
[44] Christopher J. Hughes,et al. Performance evaluation of Intel® Transactional Synchronization Extensions for high-performance computing , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[45] Chin-Long Chen,et al. Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review , 1984, IBM J. Res. Dev..
[46] Eduardo Pinheiro,et al. DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.
[47] Shekhar Y. Borkar,et al. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.
[48] Calton Pu,et al. An Analysis of Performance Interference Effects in Virtual Environments , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.
[49] Robert Tappan Morris,et al. An Analysis of Linux Scalability to Many Cores , 2010, OSDI.
[50] Timothy Roscoe,et al. Decoupling Cores, Kernels, and Operating Systems , 2014, OSDI.
[51] Donald E. Porter,et al. TxLinux: using and managing hardware transactional memory in an operating system , 2007, SOSP.
[52] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[53] Gabriel Parmer,et al. Predictable, Efficient System-Level Fault Tolerance in C^3 , 2013, 2013 IEEE 34th Real-Time Systems Symposium.
[54] Yiran Chen,et al. The salvage cache: A fault-tolerant cache architecture for next-generation memory technologies , 2009, 2009 IEEE International Conference on Computer Design.
[55] Gernot Heiser. Many-core chips — a case for virtual shared memory , 2009 .
[56] Bran Selic,et al. A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems , 2013, The Journal of Supercomputing.
[57] Kevin Klues,et al. Improving per-node efficiency in the datacenter with new OS abstractions , 2011, SoCC.