Optimizing message-passing on multicore architectures using hardware multi-threading
暂无分享,去创建一个
Tiziano De Matteis | Gabriele Mencagli | Daniele Buono | Marco Vanneschi | M. Vanneschi | Daniele Buono | G. Mencagli | T. D. Matteis
[1] Cezary Dubnicki,et al. VMMC-2 : Efficient Support for Reliable, Connection-Oriented Communication , 1997 .
[2] John Paul Shen,et al. Speculative precomputation: long-range prefetching of delinquent loads , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[3] Brian Armstrong,et al. Quantifying Differences between OpenMP and MPI Using a Large-Scale Application Suite , 2000, ISHPC.
[4] C. A. R. Hoare,et al. Communicating sequential processes , 1978, CACM.
[5] Andrew Lumsdaine,et al. Partial globalization of partitioned address spaces for zero-copy communication with shared memory , 2011, 2011 18th International Conference on High Performance Computing.
[6] Rupak Biswas,et al. The impact of hyper-threading on processor resource utilization in production applications , 2011, 2011 18th International Conference on High Performance Computing.
[7] Thorsten von Eicken,et al. U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.
[8] Matthew Curtis-Maury,et al. Integrating multiple forms of multithreaded execution on multi-SMT systems: a study with scientific applications , 2005, Second International Conference on the Quantitative Evaluation of Systems (QEST'05).
[9] Géraud Krawezik,et al. Performance comparison of MPI and three openMP programming styles on shared memory multiprocessors , 2003, SPAA '03.
[10] Gabriele Mencagli,et al. EVALUATION OF ARCHITECTURAL SUPPORTS FOR FINE-GRAINED SYNCHRONIZATION MECHANISMS , 2013 .
[11] Seth Copen Goldstein,et al. Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[12] Xingfu Wu,et al. Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers , 2011, PERV.
[13] Bernard Tourancheau,et al. BIP-SMP : High Performance Message Passing over a Cluster of Commodity SMPs , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[14] George Bosilca,et al. Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms , 2013, J. Parallel Distributed Comput..
[15] Nectarios Koziris,et al. Overlapping computation and communication in SMT clusters with commodity interconnects , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[16] Michael Allen,et al. Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .
[17] Stephen Jenks,et al. Architectural support for thread communications in multi-core processors , 2011, Parallel Comput..
[18] Hiroshi Tezuka,et al. The design and implementation of zero copy MPI using commodity hardware with a high performance network , 1998, ICS '98.
[19] Dhabaleswar K. Panda,et al. ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures , 2009, Computer Science - Research and Development.
[20] Torsten Hoefler,et al. A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.