Pipelined multithreading transformations and support mechanisms
暂无分享,去创建一个
[1] Guilherme Ottoni,et al. Support for High-Frequency Streaming in CMPs , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[2] Scott Mahlke,et al. Exploiting Instruction Level Parallelism in the Presence of Conditional Branches , 1997 .
[3] B. Ramakrishna Rau,et al. Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.
[4] Babak Falsafi,et al. Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[5] Henry Hoffmann,et al. A stream compiler for communication-exposed architectures , 2002, ASPLOS X.
[6] Gurindar S. Sohi,et al. Instruction issue logic for high-performance, interruptable pipelined processors , 1987, ISCA '87.
[7] Ron Cytron,et al. Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.
[8] Mateo Valero,et al. A decoupled KILO-instruction processor , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[9] Mateo Valero,et al. Toward kilo-instruction processors , 2004, TACO.
[10] John Paul Shen,et al. Speculative precomputation: long-range prefetching of delinquent loads , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[11] Ken Kennedy,et al. Conversion of control dependence to data dependence , 1983, POPL '83.
[12] Michael J. Flynn,et al. Communication mechanisms in shared memory multiprocessors , 1998 .
[13] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[14] Anant Agarwal,et al. Integrating message-passing and shared-memory: early experience , 1993, PPOPP '93.
[15] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[16] Kunle Olukotun,et al. A Single-Chip Multiprocessor , 1997, Computer.
[17] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[18] Sanjay J. Patel,et al. Beating in-order stalls with "flea-flicker" two-pass pipelining , 2006, IEEE transactions on computers.
[19] T. Gross,et al. !Warp-anatomy of a parallel computing system , 1999, IEEE Concurrency.
[20] Henry Hoffmann,et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.
[21] Yale N. Patt,et al. Simultaneous subordinate microthreading , 2004 .
[22] Dean M. Tullsen,et al. Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[23] Pradip Bose,et al. Optimizing pipelines for power and performance , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[24] David I. August,et al. Microarchitectural exploration with Liberty , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[25] Long Li,et al. Automatically partitioning packet processing applications for pipelined architectures , 2005, PLDI '05.
[26] David I. August,et al. The liberty structural specification language: a high-level modeling language for component reuse , 2004, PLDI '04.
[27] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.
[28] David I. August,et al. Amortizing Software Queue Overhead for Pipelined Inter-Thread Communication , 2006 .
[29] Lizy Kurian John,et al. Efficiently Evaluating Speedup Using Sampled Processor Simulation , 2004, IEEE Computer Architecture Letters.
[30] Andreas Moshovos,et al. Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.
[31] Haitham Akkary,et al. Checkpoint processing and recovery: towards scalable large instruction window processors , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[32] Monica S. Lam,et al. RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .
[33] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .
[34] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[35] Jignesh M. Patel,et al. Data prefetching by dependence graph precomputation , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[36] Anant Agarwal,et al. Scalar operand networks , 2005, IEEE Transactions on Parallel and Distributed Systems.
[37] Anand Sivasubramaniam,et al. Architectural Mechanisms for Explicit Communication in Shared Memory Multiprocessors , 1995, SC.
[38] G. H. Barnes,et al. A controllable MIMD architecture , 1986 .
[39] John Wawrzynek,et al. A Streaming Multi-Threaded Model , 2001 .
[40] John Flynn,et al. Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research , 2001 .
[41] Gurindar S. Sohi,et al. Speculative data-driven multithreading , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[42] David I. August,et al. Rapid Development of a Flexible Validated Processor Model , 2004 .
[43] Michael Wolfe,et al. Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form , 1995, TOPL.
[44] Masaru Takesue,et al. Software queue-based algorithms for pipelined synchronization on multiprocessors , 2003, 2003 International Conference on Parallel Processing Workshops, 2003. Proceedings..
[45] James R. Goodman,et al. Inferential Queueing and Speculative Push , 2003, ICS '03.
[46] Miodrag Potkonjak,et al. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[47] Kunle Olukotun,et al. The Stanford Hydra CMP , 2000, IEEE Micro.
[48] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[49] Roy Dz-Ching Ju,et al. Characterization of Repeating Data Access Patterns in Integer Benchmarks , 2001 .
[50] David I. August,et al. Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[51] Jian Huang,et al. The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.
[52] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[53] Wen-mei W. Hwu,et al. "Flea-flicker" multipass pipelining: an alternative to the high-power out-of-order offense , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[54] G. Sohi,et al. Effective jump-pointer prefetching for linked data structures , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[55] Gurindar S. Sohi,et al. Master/Slave Speculative Parallelization , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[56] Sarita V. Adve,et al. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[57] David Alejandro Padua Haiek. Multiprocessors: discussion of some theoretical and practical problems , 1980 .
[58] Anoop Gupta,et al. Integration of message passing and shared memory in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.
[59] Roland E. Wunderlich,et al. SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[60] Easwaran Raman,et al. A framework for unrestricted whole-program optimization , 2006, PLDI '06.
[61] David K. Poulsen. Memory latency reduction via data prefetching and data forwarding in shared memory multiprocessors , 1994 .
[62] Mary K. Vernon,et al. A Hybrid Shared Memory/Message Passing Parallel Machine , 1993, 1993 International Conference on Parallel Processing - ICPP'93.
[63] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[64] Thomas F. Wenisch,et al. TurboSMARTS: accurate microarchitecture simulation sampling in minutes , 2005, SIGMETRICS '05.
[65] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[66] S. Vajapeyam,et al. Improving Superscalar Instruction Dispatch And Issue By Exploiting Dynamic Code Sequences , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[67] Eric Rotenberg,et al. A study of slipstream processors , 2000, MICRO 33.
[68] Antonia Zhai,et al. The STAMPede approach to thread-level speculation , 2005, TOCS.