Communication optimizations for global multi-threaded instruction scheduling
暂无分享,去创建一个
[1] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[2] E. Ayguade,et al. Modulo scheduling with integrated register spilling for clustered VLIW architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[3] Wen-mei W. Hwu,et al. Field-testing IMPACT EPIC research results in Itanium 2 , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[4] Alexandre E. Eichenberger,et al. Effective cluster assignment for modulo scheduling , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[5] Joe D. Warren,et al. The program dependence graph and its use in optimization , 1984, TOPL.
[6] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[7] Anant Agarwal,et al. Scalar operand networks , 2005, IEEE Transactions on Parallel and Distributed Systems.
[8] Easwaran Raman,et al. A framework for unrestricted whole-program optimization , 2006, PLDI '06.
[9] David I. August,et al. Chip multi-processor scalability for single-threaded applications , 2005, CARN.
[10] D. R. Fulkerson,et al. Flows in Networks. , 1964 .
[11] Gurindar S. Sohi,et al. Master/Slave Speculative Parallelization , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[12] Nikil D. Dutt,et al. Partitioned register files for VLIWs: a preliminary analysis of tradeoffs , 1992, MICRO 25.
[13] Guilherme Ottoni,et al. Support for High-Frequency Streaming in CMPs , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[14] James R. Larus,et al. Static branch frequency and program profile analysis , 1994, MICRO 27.
[15] Jong-Deok Choi,et al. Global communication analysis and optimization , 1996, PLDI '96.
[16] Guilherme Ottoni,et al. Global Multi-Threaded Instruction Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[17] David I. August,et al. Rapid Development of a Flexible Validated Processor Model , 2004 .
[18] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.
[19] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .
[20] Hong-Seok Kim,et al. Bottom-Up and Top-Down Context-Sensitive Summary-Based Pointer Analysis , 2004, SAS.
[21] Bernhard Steffen,et al. Lazy code motion , 1992, PLDI '92.
[22] Matthew K. Farrens,et al. Code Partitioning in Decoupled Compilers , 2000, Euro-Par.
[23] Monica S. Lam,et al. Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.
[24] Mahmut T. Kandemir,et al. A global communication optimization technique based on data-flow analysis and linear algebra , 1999, TOPL.
[25] Easwaran Raman,et al. Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[26] Vivek Sarkar,et al. A Concurrent Execution Semantics for Parallel Program Graphs and Program Dependence Graphs , 1992, LCPC.
[27] Saman P. Amarasinghe,et al. Convergent scheduling , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[28] David I. August,et al. Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[29] Antonia Zhai,et al. Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.
[30] David S. Johnson,et al. Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .