HELIX: automatic parallelization of irregular programs for chip multiprocessing
暂无分享,去创建一个
Gu-Yeon Wei | David M. Brooks | Simone Campanoni | Vijay Janapa Reddi | Timothy M. Jones | Glenn H. Holloway | Gu-Yeon Wei | D. Brooks | V. Reddi | G. Holloway | Simone Campanoni
[1] Pen-Chung Yew,et al. Statement Re-ordering for DOACROSS Loops , 1994, ICPP.
[2] Guilherme Ottoni,et al. Performance scalability of decoupled software pipelining , 2008, TACO.
[3] Yun Zhang,et al. Decoupled software pipelining creates parallelization opportunities , 2010, CGO '10.
[4] Alexander Aiken,et al. Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.
[5] Chi-Keung Luk,et al. Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[6] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[7] Easwaran Raman,et al. Practical and accurate low-level pointer analysis , 2005, International Symposium on Code Generation and Optimization.
[8] Alexander V. Veidenbaum,et al. Synchronization optimizations for efficient execution on multi-cores , 2009, ICS '09.
[9] Andrew W. Appel,et al. Modern Compiler Implementation in Java, 2nd edition , 2002 .
[10] Krishna M. Kavi,et al. Parallelization of DOALL and DOACROSS Loops - A Survey , 1997, Adv. Comput..
[11] FrankeBjörn,et al. Towards a holistic approach to auto-parallelization , 2009 .
[12] Easwaran Raman,et al. Parallel-stage decoupled software pipelining , 2008, CGO '08.
[13] Donald Yeung,et al. Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[14] Scott A. Mahlke,et al. Uncovering hidden loop level parallelism in sequential applications , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[15] Yale N. Patt,et al. Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.
[16] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[17] Krishna M. Kavi,et al. A loop allocation policy for DOACROSS loops , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.
[18] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[19] Pen-Chung Yew,et al. On Effective Execution of Nonuniform DOACROSS Loops , 1996, IEEE Trans. Parallel Distributed Syst..
[20] Feng Liu,et al. Scalable Speculative Parallelization on Commodity Clusters , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[21] Andrew W. Appel,et al. Modern Compiler Implementation in Java , 1997 .
[22] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[23] Alexandru Nicolau,et al. Techniques for efficient placement of synchronization primitives , 2009, PPoPP '09.
[24] Kunle Olukotun,et al. Exposing speculative thread parallelism in SPEC2000 , 2005, PPoPP.
[25] Yale N. Patt,et al. Simultaneous subordinate microthreading , 2004 .
[26] Pen-Chung Yew,et al. Efficient Doacross execution on distributed shared-memory multiprocessors , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[27] Ron Cytron. Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.
[28] William Thies,et al. A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[29] Xiaotong Zhuang,et al. Exploiting Parallelism with Dependence-Aware Scheduling , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[30] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[31] Giovanni Agosta,et al. A highly flexible, parallel virtual machine: design and experience of ILDJIT , 2010, Softw. Pract. Exp..
[32] Arun Raman,et al. Speculative parallelization using software multi-threaded transactions , 2010, ASPLOS XV.
[33] Seung-Ju Jang,et al. Spin-block synchronization algorithm in the shared memory multiprocessor system , 1994, OPSR.
[34] Michael F. P. O'Boyle,et al. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.
[35] Pen-Chung Yew,et al. Redundant Synchronization Elimination for DOACROSS Loops , 1999, IEEE Trans. Parallel Distributed Syst..
[36] Lawrence Rauchwerger,et al. Speculative Parallelization of Partially Parallel Loops , 2000, LCR.
[37] Cheng-Zhong Xu,et al. Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences , 2001, IEEE Trans. Parallel Distributed Syst..
[38] Ding-kai Chen Pen-chung Yew. An Empirical Study on DOACROSS Loops , 1991 .
[39] Minyi Guo,et al. Optimal loop parallelization for maximizing iteration-level parallelism , 2009, CASES '09.
[40] Dean M. Tullsen,et al. Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.
[41] Yun Zhang,et al. Revisiting the Sequential Programming Model for the Multicore Era , 2008, IEEE Micro.
[42] Easwaran Raman,et al. Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[43] John L. Gustafson,et al. Reevaluating Amdahl's law , 1988, CACM.