Cache injection for parallel applications
暂无分享,去创建一个
[1] Jack Dongarra,et al. Introduction to the HPCChallenge Benchmark Suite , 2004 .
[2] V. E. Henson,et al. BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .
[3] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[4] Arthur B. Maccabe,et al. Reducing the Impact of the MemoryWall for I/O Using Cache Injection , 2007 .
[5] Robert A. van de Geijn,et al. Building a high-performance collective communication library , 1994, Proceedings of Supercomputing '94.
[6] Greg J. Regnier,et al. TCP onloading for data center servers , 2004, Computer.
[7] Steven A. Moyer,et al. Access Ordering and Effective Memory Bandwidth , 1993 .
[8] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[9] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[10] Dilma Da Silva,et al. Experience with K42, an open-source, Linux-compatible, scalable operating-system kernel , 2005, IBM Syst. J..
[11] Srihari Makineni,et al. Characterization of Direct Cache Access on multi-core systems and 10GbE , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[12] Torsten Hoefler,et al. The PERCS High-Performance Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.
[13] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.
[14] Rolf Riesen,et al. Instruction-level simulation of a cluster at scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[15] Lixin Zhang,et al. Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.
[16] Balaram Sinharoy,et al. POWER5 system microarchitecture , 2005, IBM J. Res. Dev..
[17] John K. Ousterhout,et al. Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.
[18] Sally A. McKee,et al. Increasing Memory Bandwidth for Vector Computations , 1994, Programming Languages and System Architectures.
[19] Ram Huggahalli,et al. Impact of Cache Coherence Protocols on the Processing of Network Traffic , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[20] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[21] Richard L. Graham,et al. Open MPI: A Flexible High Performance MPI , 2005, PPAM.
[22] Ram Huggahalli,et al. Direct cache access for high bandwidth network I/O , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[23] Jeffrey S. Vetter,et al. Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.
[24] Peter M. Kogge,et al. On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications , 2007, IEEE Transactions on Computers.
[25] Ron Brightwell,et al. Characterizing application sensitivity to OS interference using kernel-level noise injection , 2008, HiPC 2008.
[26] Jehoshua Bruck,et al. Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.
[27] Rolf Riesen. A Hybrid MPI Simulator , 2006, 2006 IEEE International Conference on Cluster Computing.
[28] Richard Murphy,et al. On the Effects of Memory Latency and Bandwidth on Supercomputer Application Performance , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.
[29] Nikitas J. Dimopoulos,et al. Comparing Direct-to-Cache Transfer Policies to TCP/IP and M-VIA During Receive Operations in MPI Environments , 2007, ISPA.