High Performance Architecture using Speculative Threads and Dynamic Memory Management Hardware
暂无分享,去创建一个
[1] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[2] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[3] David E. Culler,et al. The Explicit Token Store , 1990, J. Parallel Distributed Comput..
[4] John Feo,et al. SISAL reference manual. Language version 2.0 , 1990 .
[5] Sebastien Hily,et al. Contention on 2nd Level Cache May Limit the Effectiveness of Simultaneous Multithreading , 1997 .
[6] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[7] Allan Porterfield,et al. The Tera computer system , 1990 .
[8] Paul R. Wilson,et al. Dynamic Storage Allocation: A Survey and Critical Review , 1995, IWMM.
[9] André Seznec,et al. Out-of-order execution may not be cost-effective on processors featuring simultaneous multithreading , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[10] S. J. Frank,et al. Tightly coupled multiprocessor system speeds memory-access times , 1984 .
[11] James E. Smith. Decoupled access/execute architectures , 1982, ISCA 1982.
[12] Mikko H. Lipasti,et al. On the value locality of store instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[13] Theo Ungerer,et al. A multithreaded processor designed for distributed shared memory systems , 1997, Proceedings. Advances in Parallel and Distributed Computing.
[14] King-Sun Fu,et al. Data Coherence Problem in a Multicache System , 1985, IEEE Transactions on Computers.
[15] James E. Smith,et al. Decoupled access/execute computer architectures , 1984, TOCS.
[16] Krishna M. Kavi,et al. Intelligent memory manager: Reducing cache pollution due to memory management functions , 2006, J. Syst. Archit..
[17] Rudolf Eigenmann,et al. Min-cut program decomposition for thread-level speculation , 2004, PLDI '04.
[18] Larry Rudolph,et al. Issues related to MIMD shared-memory computers: the NYU ultracomputer approach , 1985, ISCA '85.
[19] V. Gerald Grafe,et al. The Epsilon-2 Multiprocessor System , 1990, J. Parallel Distributed Comput..
[20] Randy H. Katz,et al. Implementing a cache consistency protocol , 1985, ISCA '85.
[21] Chen Yang,et al. A cost-driven compilation framework for speculative parallelization of sequential programs , 2004, PLDI '04.
[22] Gurindar S. Sohi,et al. Speculative Versioning Cache , 2001, IEEE Trans. Parallel Distributed Syst..
[23] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[24] Jenq Kuen Lee,et al. Compiler support for speculative multithreading architecture with probabilistic points-to analysis , 2003, PPoPP '03.
[25] Krishna M. Kavi,et al. Parallelization of DOALL and DOACROSS Loops - A Survey , 1997, Adv. Comput..
[26] Antonia Zhai,et al. A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[27] Paul R. Wilson,et al. The memory fragmentation problem: solved? , 1998, ISMM '98.
[28] Josep Torrellas,et al. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[29] David E. Culler,et al. Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.
[30] Ron K. Cytron,et al. Upper bound for defragmenting buddy heaps , 2005, LCTES '05.
[31] Krishna M. Kavi,et al. Execution and Cache Performance of the Scheduled Dataflow Architecture , 2000, J. Univers. Comput. Sci..
[32] Gurindar S. Sohi,et al. ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.
[33] Wei Liu,et al. Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation , 2005, ICS '05.
[34] Krishna M. Kavi,et al. Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.
[35] Alexander V. Veidenbaum,et al. A Compiler-Assisted Cache Coherence Solution for Multiprcessors , 1986, ICPP.
[36] Hiroshi Yasuhara,et al. DDDP-a Distributed Data Driven Processor , 1983, ISCA '83.
[37] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.
[38] Ron K. Cytron,et al. Hardware Support for Fast and Bounded-Time Storage Allocation , 2002 .
[39] A.R. Hurson,et al. Cache memories in dataflow architecture , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.
[40] Donald E. Knuth,et al. The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition , 1997 .
[41] J. Morris Chang,et al. A High-Performance Memory Allocator for Object-Oriented Systems , 1996, IEEE Trans. Computers.
[42] Antonio González,et al. Speculative multithreaded processors , 1998, ICS '98.
[43] Antonia Zhai,et al. Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.
[44] Jian Huang,et al. The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.
[45] Kunle Olukotun,et al. Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.
[46] Donald E. Knuth,et al. The art of computer programming: V.1.: Fundamental algorithms , 1997 .
[47] Wei Liu,et al. POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.
[48] John R. Gurd,et al. Manchester data-flow: a progress report , 1992, ICS '92.
[49] Jack B. Dennis,et al. A preliminary architecture for a basic data-flow processor , 1974, ISCA '98.
[50] Krishna M. Kavi,et al. Storage Allocation for Real-Time, Embedded Systems , 2001, EMSOFT.
[51] Jack B. Dennis,et al. VAL -- A Value-Oriented Algorithmic Language (Preliminary Reference Manual), , 1979 .
[52] Krishna M. Kavi,et al. Multithreaded Systems , 1998, Adv. Comput..
[53] Gregory M. Papadopoulos,et al. Implementation of a general purpose dataflow multiprocessor , 1991 .
[54] Bob Iannucci. Toward a dataflow/von Neumann hybrid architecture , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.
[55] Seth Copen Goldstein,et al. TAM - A Compiler Controlled Threaded Abstract Machine , 1993, J. Parallel Distributed Comput..
[56] R. S. Nikhil. Can dataflow subsume von Neumann computing? , 1989, ISCA '89.
[57] IAN WATSON,et al. A prototype data flow computer with token labelling , 1979, 1979 International Workshop on Managing Requirements Knowledge (MARK).
[58] Josep Torrellas,et al. A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.
[59] James K. Archibald,et al. Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.
[60] H Sunahara,et al. On the working set concept for dataflow machines: policies and their evaluation , 1986 .
[61] Kevin P. McAuliffe,et al. RP3 Processor-Memory Element , 1985, ICPP.
[62] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.
[63] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.
[64] Sadiq M. Sait,et al. A high-performance hardware-efficient memory allocation technique and design , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).
[65] Katherine Yelick,et al. A Case for Intelligent RAM: IRAM , 1997 .
[66] Masaru Takesue. A unified resource management and execution control mechanism for data flow machines , 1987, ISCA '87.
[67] Kathryn S. McKinley,et al. Reconsidering custom memory allocation , 2002, OOPSLA '02.
[68] Ron Cytron,et al. Upper bound for defragmenting buddy heaps , 2005, LCTES.
[69] Arvind,et al. Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.
[70] Israel Koren,et al. A data-driven VLSI array for arbitrary algorithms , 1988, Computer.