High message rate, NIC-based atomics: Design and performance considerations
暂无分享,去创建一个
Karl S. Hemmert | Keith D. Underwood | Ron Brightwell | Michael Levenhagen | R. Brightwell | K. Underwood | M. Levenhagen | K. Hemmert
[1] Maged M. Michael,et al. Implementation of atomic primitives on distributed shared memory multiprocessors , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[2] Michael L. Scott,et al. Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.
[3] Keith D. Underwood,et al. Simulating Red Storm: Challenges and Successes in Building a System Simulation , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[4] Keith D. Underwood,et al. Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[5] Wu-chun Feng,et al. The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.
[6] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[7] Hermann Hellwagner,et al. SCI: Scalable Coherent Interface: Architecture and Software for High-Performance Compute Clusters , 1999 .
[8] Keith D. Underwood,et al. A comparison of 4X InfiniBand and Quadrics Elan-4 technologies , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[9] William Gropp,et al. NIC-based atomic operations on Myrinet/GM , 2002 .
[10] Keith D. Underwood,et al. Accelerating List Management for MPI , 2005, 2005 IEEE International Conference on Cluster Computing.
[11] Steve Scott,et al. The Cray BlackWidow: a highly scalable vector multiprocessor , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[12] Courtenay T. Vaughan,et al. A Simple Synchronous Distributed-Memory Algorithm for the HPCC RandomAccess Benchmark , 2006, 2006 IEEE International Conference on Cluster Computing.
[13] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[14] Keith D. Underwood,et al. SeaStar Interconnect: Balanced Bandwidth for Scalable Performance , 2006, IEEE Micro.
[15] Katherine Yelick,et al. Introduction to UPC and Language Specification , 2000 .
[16] Jon Beecroft,et al. Meiko CS-2 Interconnect Elan-Elite Design , 1994, Parallel Comput..
[17] Steven L. Scott,et al. Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.
[18] Karl S. Hemmert,et al. A hardware acceleration unit for MPI queue processing , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[19] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.
[20] Jack Dongarra,et al. Introduction to the HPCChallenge Benchmark Suite , 2004 .
[21] Jack J. Dongarra,et al. The LINPACK Benchmark: An Explanation , 1988, ICS.
[22] Bill Nitzberg,et al. Distributed shared memory: a survey of issues and algorithms , 1991, Computer.
[23] Yogish Sabharwal,et al. Software Routing and Aggregation of Messages to Optimize the Performance of HPCC Randomaccess Benchmark , 2006, ACM/IEEE SC 2006 Conference (SC'06).