Performance Optimization and Modeling of Fine-Grained Irregular Communication in UPC
暂无分享,去创建一个
Xing Cai | Phuong Hoai Ha | Lukas Einkemmer | Martina Prugger | Johannes Langguth | Jérémie Lagravière
[1] Stefan Marr,et al. Partitioned Global Address Space Languages , 2015, ACM Comput. Surv..
[2] Katherine Yelick,et al. Optimizing partitioned global address space programs for cluster architectures , 2007 .
[3] T. von Eicken,et al. Parallel programming in Split-C , 1993, Supercomputing '93.
[4] Tarek A. El-Ghazawi,et al. UPC Performance and Potential: A NPB Experimental Study , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[5] Tarek A. El-Ghazawi,et al. Fast address translation techniques for distributed shared memory compilers , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[6] Katherine Yelick,et al. UPC: Distributed Shared-Memory Programming , 2003 .
[7] George Almási. PGAS (Partitioned Global Address Space) Languages , 2011, Encyclopedia of Parallel Computing.
[8] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[9] Changjun Hu,et al. Automatic tuning of sparse matrix-vector multiplication on multicore clusters , 2015, Science China Information Sciences.
[10] Gerhard Wellein,et al. Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model , 2014, ICS.
[11] Alexander Ostermann,et al. Evaluation of the partitioned global address space (PGAS) model for an inviscid Euler solver , 2016, Parallel Comput..
[12] Katherine A. Yelick,et al. Communication optimizations for fine-grained UPC applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[13] Hang Si,et al. TetGen, a Delaunay-Based Quality Tetrahedral Mesh Generator , 2015, ACM Trans. Math. Softw..
[14] Yili Zheng. Optimizing UPC programs for multi-core systems , 2010 .
[15] Michail Alvanos,et al. Optimization techniques for fine-grained communication in PGAS environments , 2013 .
[16] Nicholas J. Wright,et al. A programming model performance study using the NAS parallel benchmarks , 2010 .
[17] Zhang Zhang,et al. Benchmark measurements of current UPC platforms , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[18] José Nelson Amaral,et al. A Characterization of Shared Data Access Patterns in UPC Programs , 2006, LCPC.
[19] Scott B. Baden,et al. Scalable Heterogeneous CPU-GPU Computations for Unstructured Tetrahedral Meshes , 2015, IEEE Micro.
[20] Nan Wu,et al. Parallel performance modeling of irregular applications in cell-centered finite volume methods over unstructured tetrahedral meshes , 2015, J. Parallel Distributed Comput..
[21] Andrea C. Arpaci-Dusseau,et al. Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.
[22] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[23] Katherine A. Yelick,et al. An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[24] Ami Marowka. Execution model of three parallel languages: OpenMP, UPC and CAF , 2005, Sci. Program..
[25] Juan Touriño,et al. Performance evaluation of sparse matrix products in UPC , 2013, The Journal of Supercomputing.
[26] Zhang Zhang,et al. A performance model for fine-grain accesses in UPC , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.