Hardware Support for Multithreaded Execution of Loops with Limited Parallelism
暂无分享,去创建一个
[1] H. T. Kung,et al. Supporting systolic and memory communication in iWarp , 1990, ISCA '90.
[2] Josep Torrellas,et al. A clustered approach to multithreaded processors , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[3] Scott A. Mahlke,et al. The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.
[4] Donald Yeung,et al. Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.
[5] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[6] Utpal Banerjee,et al. Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.
[7] B. Ramakrishna Rau,et al. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.
[8] Daniel M. Lavery,et al. Modulo Scheduling for Control-Intensive General-Purpose Programs , 1997 .
[9] Milind Girkar. Functional parallelism: theoretical foundations and implementation , 1992 .
[10] B J Smith,et al. A pipelined, shared resource MIMD computer , 1986 .
[11] Robert A. Iannucci,et al. Editors: Multithreaded computer architecture : A summary of the state of the art , 1994 .
[12] Jenn-Yuan Tsai,et al. The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.
[13] Brad Calder,et al. Threaded multiple path execution , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[14] G. Dimitriou,et al. Loop Scheduling for Multithreaded Processors , 2004 .
[15] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[16] David A. Padua,et al. High-Speed Multiprocessors and Compilation Techniques , 1980, IEEE Transactions on Computers.
[17] Donald Yeung,et al. Low-Cost Support for Fine-Grain Synchronization in Multiprocessors , 1992, Multithreaded Computer Architecture.
[18] Keshav Pingali,et al. I-structures: data structures for parallel computing , 1986, Graph Reduction.
[19] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.
[20] David A. Padua,et al. Advanced compiler optimizations for supercomputers , 1986, CACM.
[21] James P. Laudon,et al. Architectural and Implementation Tradeoffs for Multiple-Context Processors , 1995 .
[22] David E. Culler,et al. The Explicit Token Store , 1990, J. Parallel Distributed Comput..
[23] B. Ramakrishna Rau,et al. Instruction-level parallel processing: History, overview, and perspective , 2005, The Journal of Supercomputing.
[24] Antonio González,et al. Speculative multithreaded processors , 1998, ICS '98.
[25] Josep Torrellas,et al. Removing architectural bottlenecks to the scalability of speculative parallelization , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[26] Antonia Zhai,et al. Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.
[27] H. T. Kung. Deadlock avoidance for systolic communication , 1988, ISCA 1988.
[28] Roger A. Bringmann. Enhancing instruction level parallelism through compiler-controlled speculation , 1995 .
[29] Kevin O'Brien,et al. Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading , 1995, PACT.