Tiled microprocessors
暂无分享,去创建一个
[1] Anant Agarwal,et al. Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[2] Jessica H. Tseng. Banked microarchitectures for complexity-effective superscalar microprocessors , 2006 .
[3] Bob Bentley,et al. Validating the Intel(R) Pentium(R) 4 microprocessor , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[4] Karthikeyan Sankaralingam,et al. Routed inter-ALU networks for ILP scalability and performance , 2003, Proceedings 21st International Conference on Computer Design.
[5] William Thies,et al. Phased scheduling of stream programs , 2003 .
[6] José Duato,et al. A General Theory for Deadlock-Free Adaptive Routing Using a Mixed Set of Resources , 2001, IEEE Trans. Parallel Distributed Syst..
[7] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[8] Vikas Agarwal,et al. Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[9] William Thies,et al. Teleport messaging for distributed stream programs , 2005, PPoPP.
[10] B. Flietner,et al. 'System on a chip' technology platform for 0.18 /spl mu/m digital, mixed signal and eDRAM applications , 1999, International Electron Devices Meeting 1999. Technical Digest (Cat. No.99CH36318).
[11] Henry Hoffmann,et al. Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[12] Bob Iannucci. Toward a dataflow/von Neumann hybrid architecture , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.
[13] Michael Bedford Taylor,et al. Design decision in the implementation of a raw architecture workstation , 1999 .
[14] MARC TREMBLAY,et al. The Design of the Microarchitecture of UltraSPARCTM-I , 1995 .
[15] Anant Agarwal,et al. A quantitative comparison of reconfigurable, tiled, and conventional architectures on bit-level computation , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[16] R. Ho. Chip Wires: Scaling and Efficiency , 2003 .
[17] Samuel D. Naffziger,et al. The implementation of the Itanium 2 microprocessor , 2002, IEEE J. Solid State Circuits.
[18] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[19] Stanley Mazor,et al. The history of the 4004 , 1996, IEEE Micro.
[20] Charles L. Seitz,et al. Design of the Mosaic Element , 1983 .
[21] Venkatesh Akella,et al. Synchroscalar: a multiple clock domain, power-aware, tile-based embedded processor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[22] Rajeev Barua,et al. Compiler Support for Scalable and Efficient Memory Systems , 2001, IEEE Trans. Computers.
[23] H. T. Kung,et al. The Warp Computer: Architecture, Implementation, and Performance , 1987, IEEE Transactions on Computers.
[24] Henry Hoffmann,et al. Stream Algorithms and Architecture , 2004, J. Instr. Level Parallelism.
[25] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[26] Jack B. Dennis,et al. A preliminary architecture for a basic data-flow processor , 1974, ISCA '98.
[27] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[28] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.
[29] William J. Dally,et al. The J-machine Multicomputer: An Architectural Evaluation , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[30] A. J. KleinOsowski,et al. MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.
[31] John Kubiatowicz,et al. Integrated shared-memory and message-passing communication in the Alewife multiprocessor , 1998 .
[32] Henk Corporaal,et al. Partitioned register file for TTAs , 1995, MICRO 1995.
[33] Ken Mai,et al. The future of wires , 2001, Proc. IEEE.
[34] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[35] Arvind,et al. The Evolution of Dataflow Architectures: from Static Dataflow to P-RISC , 1993, Int. J. High Speed Comput..
[36] JAMES DEMMEL,et al. LAPACK: A portable linear algebra library for high-performance computers , 1990, Proceedings SUPERCOMPUTING '90.
[37] T. Gross,et al. !Warp-anatomy of a parallel computing system , 1999, IEEE Concurrency.
[38] Henry Hoffmann,et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.
[39] Cameron McNairy,et al. Itanium 2 Processor Microarchitecture , 2003, IEEE Micro.
[40] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[41] Timothy Mark Pinkston,et al. A Progressive Approach to Handling Message-Dependent Deadlock in Parallel Computer Systems , 2003, IEEE Trans. Parallel Distributed Syst..
[42] David Wentzlaff. Architectural implications of bit-level computation in communication applications , 2002 .
[43] Nate Kushman,et al. Performance Nonmonotonicities: A Case Study of the UltraSPARC Processor , 1998 .
[44] Michael I. Gordon,et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.
[45] Paul S. Zuchowski,et al. Technology-migratable ASIC library design , 1996, IBM J. Res. Dev..
[46] K. Steinhubl. Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .
[47] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[48] Robert H. Dennard,et al. CMOS scaling for high performance and low power-the next ten years , 1995, Proc. IEEE.
[49] Xia Chen,et al. A spatial path scheduling algorithm for EDGE architectures , 2006, ASPLOS XII.
[50] David G. Chinnery,et al. Closing the Gap Between ASIC and Custom - Tools and Techniques for High-Performance ASIC Design , 2002 .
[51] M. Bohr. Interconnect scaling-the real limiter to high performance ULSI , 1995, Proceedings of International Electron Devices Meeting.
[52] Norman P. Jouppi,et al. The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[53] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[54] Steven Swanson,et al. Instruction scheduling for a tiled dataflow architecture , 2006, ASPLOS XII.
[55] Gerald H. Hilderink,et al. Parallel Processing — the picoChip way! , 2003 .
[56] José Duato,et al. A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks , 1994, 1994 International Conference on Parallel Processing Vol. 1.
[57] Stephen H. Gunther,et al. Managing the Impact of Increasing Microprocessor Power Consumption , 2001 .
[58] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[59] William Thies,et al. Linear analysis and optimization of stream programs , 2003, PLDI '03.
[60] Aaron Smith,et al. Compiling for EDGE architectures , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[61] William J. Dally,et al. The Imagine Stream Processor , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[62] Victor Lee,et al. Exploiting two-case delivery for fast protected messaging , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.
[63] Kathryn S. McKinley,et al. Static placement, dynamic issue (SPDI) scheduling for EDGE architectures , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[64] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.
[65] Christopher Batten,et al. The vector-thread architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[66] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[67] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..
[68] Donald Yeung,et al. SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures , 2001, IEEE Trans. Parallel Distributed Syst..
[69] Ho-Seop Kim,et al. An instruction set and microarchitecture for instruction level distributed processing , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[70] Rajeev Barua,et al. Compiler-managed memory system for software-exposed architectures , 2000 .
[71] Henk Corporaal,et al. MOVE: a framework for high-performance processor design , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[72] Jiawen Chen,et al. A reconfigurable architecture for load-balanced rendering , 2005, HWWS '05.
[73] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .
[74] Stephen P. Crago,et al. A performance analysis of PIM, stream processing, and tiled processing on memory-intensive signal processing kernels , 2003, ISCA '03.
[75] R. Nagarajan,et al. A design space evaluation of grid processor architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[76] David Shoemaker,et al. NuMesh: An architecture optimized for scheduled communication , 2004, The Journal of Supercomputing.
[77] William J. Dally,et al. Principles and Practices of Interconnection Networks , 2004 .
[78] Henry Hoffmann,et al. A stream compiler for communication-exposed architectures , 2002, ASPLOS X.
[79] Christoforos E. Kozyrakis,et al. Overcoming the limitations of conventional vector processors , 2003, ISCA '03.
[80] David Wentzlaff,et al. Energy characterization of a tiled architecture processor with on-chip networks , 2003, ISLPED '03.
[81] Anant Agarwal,et al. Scalar operand networks , 2005, IEEE Transactions on Parallel and Distributed Systems.
[82] Steven Swanson,et al. The WaveScalar architecture , 2007, TOCS.
[83] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[84] Thomas Schubert,et al. High-level formal verification of next-generation microprocessors , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).
[85] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[86] G.E. Moore,et al. Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.
[87] John Wawrzynek,et al. Garp: a MIPS processor with a reconfigurable coprocessor , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).
[88] Anant Agarwal,et al. Software orchestration of instruction level parallelism on tiled processor architectures , 2005 .
[89] Seth Copen Goldstein,et al. Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[90] Anant Agarwal,et al. Anatomy of a message in the Alewife multiprocessor , 1993, ICS '93.
[91] P. Bai,et al. A high performance 180 nm generation logic technology , 1998, International Electron Devices Meeting 1998. Technical Digest (Cat. No.98CH36217).
[92] P. Buffet,et al. Methodology for I/O cell placement and checking in ASIC designs using area-array power grid , 2000, Proceedings of the IEEE 2000 Custom Integrated Circuits Conference (Cat. No.00CH37044).
[93] Michael Taylor. Deionizer: A Tool for Capturing and Embedding I/O Cells , 2004 .
[94] William Thies,et al. Optimizing stream programs using linear state space analysis , 2005, CASES '05.
[95] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[96] Matthew Mattina,et al. Tarantula: a vector extension to the alpha architecture , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[97] Henk Corporaal. Transport Triggered Architectures : Design and Evaluation , 1995 .
[98] Noah Treuhaft,et al. Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.
[99] Doug Matzke,et al. Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.
[100] Anant Agarwal,et al. How to build scalable on-chip ILP networks for a decentralized architecture , 2000 .
[101] Saman P. Amarasinghe,et al. Maps: a compiler-managed memory system for Raw machines , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[102] William J. Dally,et al. A VLSI Architecture for Concurrent Data Structures , 1987 .
[103] William J. Dally,et al. A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.