Optimizing shared data accesses in distributed-memory X10 systems
暂无分享,去创建一个
[1] Alan L. Cox,et al. TreadMarks: shared memory computing on networks of workstations , 1996 .
[2] Kathryn S. McKinley,et al. Data flow analysis for software prefetching linked data structures in Java , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[3] Laxmikant V. Kalé,et al. MSA: Multiphase Specifically Shared Arrays , 2004, LCPC.
[4] Jarek Nieplocha,et al. Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit , 2006, Int. J. High Perform. Comput. Appl..
[5] Vijayalakshmi Srinivasan,et al. A Tagless Coherence Directory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] José González,et al. Owner Prediction for Accelerating Cache-to-Cache Transfer Misses in a cc-NUMA Architecture , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[7] Katherine A. Yelick,et al. Hybrid PGAS runtime support for multicore nodes , 2010, PGAS '10.
[8] Katherine A. Yelick,et al. Communication optimizations for fine-grained UPC applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[9] Kai Li,et al. IVY: A Shared Virtual Memory System for Parallel Computing , 1988, ICPP.
[10] Anoop Gupta,et al. The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.
[11] José Nelson Amaral,et al. Improving communication in PGAS environments: static and dynamic coalescing in UPC , 2013, ICS '13.
[12] Willy Zwaenepoel,et al. Techniques for reducing consistency-related communication in distributed shared-memory systems , 1995, TOCS.
[13] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[14] Martin Burtscher,et al. Delphi: Predition-based Page Prefetching to Improve the Performance of Shared Virtual Memory Systems , 2002, PDPTA.
[15] Phillip Colella,et al. An adaptive mesh refinement benchmark for modern parallel programming languages , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[16] James K. Archibald,et al. Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.
[17] Alan L. Cox,et al. Lazy release consistency for software distributed shared memory , 1992, ISCA '92.
[18] Stephen L. Olivier,et al. UTS: An Unbalanced Tree Search Benchmark , 2006, LCPC.
[19] Vivek Sarkar,et al. Communication Optimizations for Distributed-Memory X10 Programs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[20] Andrew Brownsword,et al. Synchronization via scheduling: techniques for efficiently managing shared state , 2011, PLDI '11.
[21] George Almási,et al. Scalable RDMA performance in PGAS languages , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[22] José Nelson Amaral,et al. Shared memory programming for large scale machines , 2006, PLDI '06.
[23] Laurie J. Hendren,et al. Communication optimizations for parallel C programs , 1998, J. Parallel Distributed Comput..
[24] Willy Zwaenepoel,et al. Munin: distributed shared memory based on type-specific memory coherence , 1990, PPOPP '90.
[25] Kevin Skadron,et al. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.
[26] Babak Falsafi,et al. Last-Touch Correlated Data Streaming , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.
[27] Ye Sun,et al. Distributed transactional memory for metric-space networks , 2005, Distributed Computing.
[28] Bradford L. Chamberlain,et al. Software transactional memory for large scale clusters , 2008, PPoPP.
[29] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.
[30] David Cunningham,et al. A performance model for X10 applications: what's going on under the hood? , 2011, X10 '11.
[31] Daniel A. Reed,et al. Dynamic object management for distributed data structures , 1992, Proceedings Supercomputing '92.