Memory and Control Organizations of Stream Processors a Dissertation Submitted to the Department of Electrical Engineering and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
暂无分享,去创建一个
[1] C. Radke. International Conference on Computer Design , 2022 .
[2] William J. Dally,et al. Compiling for stream processing , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[3] William H. Press,et al. In: Numerical Recipes in Fortran 90 , 1996 .
[4] William J. Dally,et al. Programmable Stream Processors , 2003, Computer.
[5] Mateo Valero,et al. Command vector memory systems: high performance at low cost , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[6] Iain E. G. Richardson,et al. H.264 and MPEG-4 Video Compression: Video Coding for Next-Generation Multimedia , 2003 .
[7] Sally A. McKee,et al. Access order and effective bandwidth for streams on a Direct Rambus memory , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[8] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[9] James Laudon,et al. The SGI Origin: A ccNUMA Highly Scalable Server , 1997, ISCA.
[10] Ralph Grishman,et al. The NYU ultracomputer—designing a MIMD, shared-memory parallel machine , 2018, ISCA '98.
[11] Larry Carter,et al. NAS Benchmarks on the Tera MTA , 1998 .
[12] Guy E. Blelloch,et al. Scan primitives for vector computers , 1990, Proceedings SUPERCOMPUTING '90.
[13] William J. Dally,et al. Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[14] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[15] S. Asano,et al. The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..
[16] A. Belegundu,et al. Introduction to Finite Elements in Engineering , 1990 .
[17] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[18] James Smith,et al. A Simulation Study of the CRAY X-MP Memory System , 1986, IEEE Transactions on Computers.
[19] Mattan Erez,et al. Merrimac-high-performance and highly-efficient scientific computing with streams , 2006 .
[20] John D. Owens,et al. Computer graphics on a stream architecture , 2002 .
[21] Quinn Jacobson,et al. Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[22] Leonid Oliker,et al. Memory-intensive benchmarks: IRAM vs. cache-based machines , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[23] Shreekant S. Thakkar,et al. Internet Streaming SIMD Extensions , 1999, Computer.
[24] Yale N. Patt,et al. One Billion Transistors, One Uniprocessor, One Chip , 1997, Computer.
[25] Timothy Joe Williams. A 3D gyrokinetic particle-in-cell simulation of fusion plasma microturbulence on parallel computers , 1992 .
[26] B. Ramakrishna Rau,et al. Pseudo-randomly interleaved memory , 1991, ISCA '91.
[27] Anastasis A. Sofokleous,et al. Review: H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia , 2005, Comput. J..
[28] Henry G. Dietz,et al. A case for aggregate networks , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[29] William J. Dally,et al. Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.
[30] William J. Dally,et al. Data parallel address architecture , 2006, IEEE Computer Architecture Letters.
[31] Pat Hanrahan,et al. A real-time procedural shading system for programmable graphics hardware , 2001, SIGGRAPH.
[32] Fred Weber,et al. AMD 3DNow! technology: architecture and implementations , 1999, IEEE Micro.
[33] William J. Dally,et al. Exploring the VLSI scalability of stream processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[34] J. Little. A Proof for the Queuing Formula: L = λW , 1961 .
[35] Duncan G. Elliott,et al. Computational Ram: A Memory-simd Hybrid And Its Application To Dsp , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.
[36] William J. Dally,et al. Stream register files with indexed access , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[37] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[38] M. Horowitz,et al. The stream virtual machine , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[39] William J. Dally,et al. Imagine: Media Processing with Streams , 2001, IEEE Micro.
[40] Dave Shreiner. OpenGL Reference Manual: The Official Reference Document to OpenGL, Version 1.2 , 1999 .
[41] William J. Dally,et al. Fault Tolerance Techniques for the Merrimac Streaming Supercomputer , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[42] R. E. Kessler,et al. Cray T3D: a new dimension for Cray Research , 1993, Digest of Papers. Compcon Spring.
[43] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[44] Christopher C. Hsiung,et al. Cray X-MP: the birth of a supercomputer , 1989, Computer.
[45] Jung Ho Ahn,et al. Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[46] Edward A. Lee,et al. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.
[47] Eric Darve,et al. Calculating Free Energies Using a Scaled-Force Molecular Dynamics Algorithm , 2002 .
[48] Trevor Mudge,et al. Modern dram architectures , 2001 .
[49] Charles Clos,et al. A study of non-blocking switching networks , 1953 .
[50] Christopher Batten,et al. The Vector-Thread Architecture , 2004, ISCA 2004.
[51] William Thies,et al. StreamIt: A Language for Streaming Applications , 2002, CC.
[52] Leslie Kohn,et al. Introducing the Intel i860 64-bit microprocessor , 1989, IEEE Micro.
[53] Frederic T. Chong,et al. Active pages: a computation model for intelligent memory , 1998, ISCA.
[54] David A. Patterson,et al. Scalable Vector Media-processors for Embedded Systems , 2002 .
[55] William J. Dally,et al. Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[56] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[57] Noah Treuhaft,et al. Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.
[58] Mendel Rosenblum,et al. Stream programming on general-purpose processors , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[59] Michael Woodacre. The SGI® Altix 3000 Global Shared-Memory Architecture , 2003 .
[60] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[61] Kunle Olukotun,et al. The Stanford Hydra CMP , 2000, IEEE Micro.
[62] Ronald T. Williams,et al. RT_STAP: Real-Time Space-Time Adaptive Processing Benchmark , 1997 .
[63] William J. Dally,et al. Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[64] Alvaro L. G. A. Coutinho,et al. CLUSTERED EDGE-BY-EDGE PRECONDITIONERS FORNON-SYMMETRIC FINITE ELEMENT EQUATIONSLucia , 1998 .
[65] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[66] W. Daniel Hillis,et al. The CM-5 Connection Machine: a scalable supercomputer , 1993, CACM.
[67] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.
[68] William J. Dally,et al. Analysis and Performance Results of a Molecular Modeling Application on Merrimac , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[69] J. W. Backus,et al. Can programming be liberated from the von Neumann style , 1977 .
[70] W. Dally,et al. Communication scheduling , 2000, SIGP.
[71] Jung Ho Ahn,et al. The Design Space of Data-Parallel Memory Systems , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[72] Christoforos E. Kozyrakis,et al. Overcoming the limitations of conventional vector processors , 2003, ISCA '03.
[73] Hunter Scales,et al. AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.
[74] B. Flachs,et al. A streaming processing unit for a CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..
[75] Norman P. Jouppi,et al. The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays , 2002, ISCA.
[76] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[77] William J. Dally,et al. Conditional techniques for stream processing kernels , 2004 .
[78] Sanjay Ranka,et al. Array Combining Scatter Functions on Coarse-Grained, Distributed-Memory Parallel Machines , 1998 .
[79] William J. Dally,et al. Scatter-add in data parallel architectures , 2005, 11th International Symposium on High-Performance Computer Architecture.
[80] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[81] James E. Smith,et al. Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[82] Gurindar S. Sohi. High-Bandwidth Interleaved Memories for Vector Processors-A Simulation Study , 1993, IEEE Trans. Computers.