The Synchronized Pipelined Parallelism Model
暂无分享,去创建一个
[1] Matthew J. Holliman,et al. Media Applications on Hyper-Threading Technology , 2002 .
[2] William J. Dally,et al. Programmable Stream Processors , 2003, Computer.
[3] Shekhar Y. Borkar,et al. iWarp: an integrated solution to high-speed parallel computing , 1988, Proceedings. SUPERCOMPUTING '88.
[4] Monica S. Lam,et al. Locality Optimizations for Parallel Machines , 1994, CONPAR.
[5] Balaram Sinharoy,et al. IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.
[6] H GornishEdward,et al. Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990 .
[7] Anoop Gupta,et al. The DASH prototype: implementation and performance , 1992, ISCA '92.
[8] H. T. Kung,et al. Applications and Algorithm Partitioning on Warp , 1987, COMPCON.
[9] Dean M. Tullsen,et al. Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.
[10] Dimitrios S. Nikolopoulos. Code and Data Transformations for Improving Shared Cache Performance on SMT Processors , 2003, ISHPC.
[11] R. Ferreira,et al. Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[12] H. T. Kung,et al. Systolic Arrays for (VLSI). , 1978 .
[13] Jean-Luc Gaudiot,et al. Quantifying the SMT layout overhead-does SMT pull its weight? , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[14] Ricardo Bianchini,et al. Limits on the performance benefits of multithreading and prefetching , 1996, SIGMETRICS '96.
[15] John Paul Shen,et al. Speculative Precomputation : Exploring the Use of Multithreading for Latency 1 Speculative Precomputation : Exploring the Use of Multithreading for Latency , 2002 .
[16] Jung Ho Ahn,et al. Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[17] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[18] Monica S. Lam,et al. Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..
[19] H. T. Kung,et al. The Warp Computer: Architecture, Implementation, and Performance , 1987, IEEE Transactions on Computers.
[20] Kunle Olukotun,et al. The Stanford Hydra CMP , 2000, IEEE Micro.
[21] S LamMonica,et al. Communication optimization and code generation for distributed memory machines , 1993 .
[22] Guang R. Gao,et al. Advanced topics in dataflow computing and multithreading , 1994 .
[23] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .
[24] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[25] Arvind,et al. M-Structures: Extending a Parallel, Non-strict, Functional Language with State , 1991, FPCA.
[26] Monica S. Lam,et al. Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.
[27] Anoop Gupta,et al. The Stanford FLASH multiprocessor , 1994, ISCA '94.
[28] Jean-Luc Gaudiot,et al. Exploiting global data locality in non-blocking multithreaded architectures , 1997, Proceedings of the 1997 International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN'97).
[29] George Karypis,et al. Introduction to Parallel Computing , 1994 .
[30] Alexander V. Veidenbaum,et al. Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990, ICS '90.
[31] Pat Conway,et al. The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.
[32] Eitan Grinspun,et al. Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.
[33] D. Marr,et al. Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .
[34] Bernd Freisleben,et al. Automatic Parallelization of Divide-and-Conquer Algorithms , 1992, CONPAR.