Parabilis: Speeding up Single-Threaded Applications by Extracting Fine-Grained Threads for Multi-core Execution
暂无分享,去创建一个
Krishna M. Kavi | Oghenekarho Okobiah | Oleg Garitselov | Ademola Fawibe | Izuchukwu Nwachukwu | Mohana Asha Latha Dubasi | Vinay R. Prabhu
[1] Antonia Zhai,et al. A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[2] David I. August,et al. Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[3] Kunle Olukotun,et al. TEST: a Tracer for Extracting Speculative Threads , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[4] Rudolf Eigenmann,et al. Min-cut program decomposition for thread-level speculation , 2004, PLDI '04.
[5] Rudolf Eigenmann,et al. Speculative thread decomposition through empirical optimization , 2007, PPoPP.
[6] Pedro López,et al. Boosting single-thread performance in multi-core systems through fine-grain multi-threading , 2009, ISCA '09.
[7] Michel Dubois,et al. Loop-level Speculative Parallelism in Embedded Applications , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).
[8] Mark Heffernan,et al. Data-Dependency Graph Transformations for Instruction Scheduling , 2005, J. Sched..
[9] Mahmut T. Kandemir,et al. Compiler-directed instruction duplication for soft error detection , 2005, Design, Automation and Test in Europe.
[10] Kunle Olukotun,et al. The Jrpm system for dynamically parallelizing Java programs , 2003, ISCA '03.
[11] Kevin Skadron,et al. Federation: Out-of-Order Execution using Simple In-Order Cores , 2007 .
[12] Mahmut T. Kandemir,et al. A helper thread based EDP reduction scheme for adapting application execution in CMPs , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[13] J. M. Codina,et al. Instruction replication for clustered microarchitectures , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[14] Manoj Franklin,et al. Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency , 2005, IEEE Trans. Computers.
[15] Engin Ipek,et al. Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.
[16] David R. Kaeli,et al. AGAMOS: A Graph-Based Approach to Modulo Scheduling for Clustered Microarchitectures , 2009, IEEE Transactions on Computers.
[17] Bo Han,et al. Prophet Synchronization Thread Model and Compiler Support , 2010, International Symposium on Parallel and Distributed Processing with Applications.
[18] HeffernanMark,et al. Data-Dependency Graph Transformations for Instruction Scheduling , 2005 .
[19] Josep Torrellas,et al. Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[20] Scott A. Mahlke,et al. Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.