Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture
暂无分享,去创建一个
Jeff LaCoss | John J. Granacki | William C. Athas | Jaewook Shin | Peter M. Kogge | Jeffrey T. Draper | Pedro C. Diniz | Jacqueline Chame | Apoorv Srivastava | Mary W. Hall | Jay B. Brockman | Vincent W. Freeh | Joonseok Park | Jefferey G. Koller | P. Diniz | V. Freeh | P. Kogge | J. Brockman | J. LaCoss | J. Draper | Jaewook Shin | J. Granacki | W. Athas | Joonseok Park | J. Koller | Apoorv Srivastava | Jacqueline Chame
[1] Thomas L. Sterling,et al. Microservers: a new memory semantics for massively parallel computing , 1999, ICS '99.
[2] M. Birnbaum,et al. How VSIA Answers the SOC Dilemma , 1999, Computer.
[3] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[4] William J. Dally,et al. A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[5] M. Oskin,et al. Active Pages: a computation model for intelligent memory , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).
[6] Katherine Yelick,et al. A Case for Intelligent DRAM: IRAM , 1998 .
[7] D. Culler,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.
[8] Martin C. Rinard,et al. Commutativity analysis: a new analysis technique for parallelizing compilers , 1997, TOPL.
[9] Jeffrey T. Draper,et al. A bus-efficient low-latency network interface for the PDSS multicomputer , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).
[10] D. Burger,et al. Datascalar Architectures , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[11] Christoforos E. Kozyrakis,et al. A case for intelligent RAM , 1997, IEEE Micro.
[12] Katherine Yelick,et al. A Case for Intelligent RAM: IRAM , 1997 .
[13] Yunheung Paek,et al. Parallel Programming with Polaris , 1996, Computer.
[14] Monica S. Lam,et al. Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..
[15] Aart J. C. Bik,et al. Simple Qualitative Experiments with a Sparse Compiler , 1996, LCPC.
[16] Fong Pong,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[17] D. Burger,et al. Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[18] P.M. Kogge,et al. Pursuing a petaflop: point designs for 100 TF computers using PIM technologies , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).
[19] Jeffrey T. Draper,et al. The Red Rover Algorithm for Deadlock-Free Routing on Bidirectional Rings , 1996, PDPTA.
[20] Maya Gokhale,et al. Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.
[21] J. Carter,et al. An argument for simple COMA , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[22] Ian T. Foster,et al. Designing and building parallel programs - concepts and tools for parallel software engineering , 1995 .
[23] David Keppel,et al. Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.
[24] Ian Foster,et al. Designing and building parallel programs , 1994 .
[25] A. Cozzolino,et al. Powerpc microprocessor family: the programming environments , 1994 .
[26] Anne Rogers,et al. Early Experiences with Olden , 1993, LCPC.
[27] Monica S. Lam,et al. Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.
[28] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[29] William J. Dally,et al. The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.
[30] S. C. Knowles,et al. Arithmetic processor design for the T9000 transputer , 1991, Optics & Photonics.
[31] Maurice Herlihy,et al. Wait-free synchronization , 1991, TOPL.
[32] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).