Scatter-add in data parallel architectures
暂无分享,去创建一个
[1] Leslie Kohn,et al. Introducing the Intel i860 64-bit microprocessor , 1989, IEEE Micro.
[2] Marc Tremblay,et al. VIS speeds new media processing , 1996, IEEE Micro.
[3] Uri C. Weiser,et al. MMX technology extension to the Intel architecture , 1996, IEEE Micro.
[4] Dave Shreiner. OpenGL Reference Manual: The Official Reference Document to OpenGL, Version 1.2 , 1999 .
[5] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.
[6] Eric Darve,et al. Calculating Free Energies Using a Scaled-Force Molecular Dynamics Algorithm , 2002 .
[7] Larry Carter,et al. NAS Benchmarks on the Tera MTA , 1998 .
[8] Sony’s Emotionally Charged Chip , 1999 .
[9] Hans P. Zima,et al. The Earth Simulator , 2004, Parallel Comput..
[10] Duncan G. Elliott,et al. Computational Ram: A Memory-simd Hybrid And Its Application To Dsp , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.
[11] William J. Dally,et al. The VLSI implementation and evaluation of area-and energy-efficient streaming media processors , 2003 .
[12] Sanjay Ranka,et al. Array Combining Scatter Functions on Coarse-Grained, Distributed-Memory Parallel Machines , 1998 .
[13] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[14] Timothy Joe Williams. A 3D gyrokinetic particle-in-cell simulation of fusion plasma microturbulence on parallel computers , 1992 .
[15] Henry G. Dietz,et al. A case for aggregate networks , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[16] Jung Ho Ahn,et al. Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[17] Rice UniversityCORPORATE,et al. High performance Fortran language specification , 1993 .
[18] Michael Woodacre. The SGI® Altix 3000 Global Shared-Memory Architecture , 2003 .
[19] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[20] Seung-Moon Yoo,et al. FlexRAM: toward an advanced intelligent memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).
[21] Christoforos Kozyrakis,et al. A Media-Enhanced Vector Architecture for Embedded Memory Systems , 1999 .
[22] William H. Press,et al. Numerical Recipes: FORTRAN , 1988 .
[23] William H. Press,et al. In: Numerical Recipes in Fortran 90 , 1996 .
[24] William J. Dally,et al. Programmable Stream Processors , 2003, Computer.
[25] R. E. Kessler,et al. Cray T3D: a new dimension for Cray Research , 1993, Digest of Papers. Compcon Spring.
[26] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[27] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.
[28] Alvaro L. G. A. Coutinho,et al. CLUSTERED EDGE-BY-EDGE PRECONDITIONERS FORNON-SYMMETRIC FINITE ELEMENT EQUATIONSLucia , 1998 .
[29] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[30] W. Daniel Hillis,et al. The CM-5 Connection Machine: a scalable supercomputer , 1993, CACM.
[31] A. Belegundu,et al. Introduction to Finite Elements in Engineering , 1990 .
[32] W. Daniel Hillis,et al. The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..
[33] Richard M. Brown,et al. The ILLIAC IV Computer , 1968, IEEE Transactions on Computers.
[34] Ralph Grishman,et al. The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer , 1983, IEEE Transactions on Computers.
[35] Frederic T. Chong,et al. Active pages: a computation model for intelligent memory , 1998, ISCA.
[36] Guy E. Blelloch,et al. Scan primitives for vector computers , 1990, Proceedings SUPERCOMPUTING '90.
[37] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[38] Leonid Oliker,et al. Memory-intensive benchmarks: IRAM vs. cache-based machines , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.