Extending τ-Lop to model concurrent MPI communications in multicore clusters
暂无分享,去创建一个
[1] Alexey L. Lastovetsky,et al. An accurate communication model of a heterogeneous cluster based on a switch-enabled Ethernet network , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).
[2] Ziming Zhong,et al. FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms , 2014, The Journal of Supercomputing.
[3] Luiz Angelo Steffenel,et al. Identifying Logical Homogeneous Clusters for Efficient Wide-Area Communications , 2004, PVM/MPI.
[4] Rong Ge,et al. $\log_{\rm n}{\rm P}$ and $\log_{3}{\rm P}$: Accurate Analytical Models of Point-to-Point Communication in Distributed Systems , 2007, IEEE Transactions on Computers.
[5] Susumu Shibusawa,et al. Scheduling algorithms for efficient gather operations in distributed heterogeneous systems , 2000, Proceedings 2000. International Workshop on Parallel Processing.
[6] Torsten Hoefler,et al. LogfP - a model for small messages in InfiniBand , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[7] Rolf Riesen,et al. Communication Models for Resource Constrained Hierarchical Ethernet Networks , 2013, Euro-Par Workshops.
[8] Jin Zhang,et al. LogGPO: An accurate communication model for performance prediction of MPI programs , 2009, Science in China Series F: Information Sciences.
[9] Rong Ge,et al. Predicting and Evaluating Distributed Communication Performance , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[10] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[11] Rajeev Thakur,et al. Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.
[12] Jesper Larsson Träff,et al. An Optimal Broadcast Algorithm Adapted to SMP Clusters , 2005, PVM/MPI.
[13] Jean-François Méhaut,et al. Prediction of Communication Latency over Complex Network Behaviors on SMP Clusters , 2005, EPEW/WS-FM.
[14] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[15] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[16] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience: Research Articles , 2007 .
[17] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[18] Sayantan Sur,et al. LiMIC: support for high-performance MPI intra-node communication on Linux cluster , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[19] Fukuhito Ooshita,et al. Efficient gather operation in heterogeneous cluster systems , 2002, Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications.
[20] Roger W. Hockney,et al. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.
[21] Jack J. Dongarra,et al. Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[22] Csaba Andras Moritz,et al. LoGPC: Modeling Network Contention in Message-Passing Programs , 2001, IEEE Trans. Parallel Distributed Syst..
[23] Juan Carlos Díaz Martín,et al. τ-Lop: Modeling performance of shared memory MPI , 2015, Parallel Comput..
[24] Alexey L. Lastovetsky,et al. Revisiting communication performance models for computational clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[25] Csaba Andras Moritz,et al. Performance Modeling and Evaluation of MPI , 2001, J. Parallel Distributed Comput..
[26] Viktor K. Prasanna,et al. Efficient collective communication in distributed heterogeneous systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).
[27] Alexey L. Lastovetsky,et al. Modeling Contention and Mapping Effects in Multi-core Clusters , 2015, Euro-Par Workshops.
[28] Jean-François Méhaut,et al. A Contention-Aware Performance Model for HPC-Based Networks: A Case Study of the InfiniBand Network , 2011, Euro-Par.
[29] Dhabaleswar K. Panda,et al. High performance RDMA-based MPI implementation over InfiniBand , 2003, ICS.
[30] W HockneyRoger. The communication challenge for MPP , 1994 .
[31] Alexey L. Lastovetsky,et al. Accurate and Efficient Estimation of Parameters of Heterogeneous Communication Performance Models , 2009, Int. J. High Perform. Comput. Appl..
[32] Xiaofang Zhao,et al. Performance analysis and optimization of MPI collective operations on multi-core clusters , 2009, The Journal of Supercomputing.
[33] V. Jerome,et al. Predictive models for bandwidth sharing in high performance clusters , 2008, CLUSTER 2008.
[34] Ramesh Subramonian,et al. LogP: a practical model of parallel computation , 1996, CACM.
[35] Rolf Rabenseifner,et al. Automatic Profiling of MPI Applications with Hardware Performance Counters , 1999, PVM/MPI.
[36] Luiz Angelo Steffenel,et al. Modeling Network Contention Effects on All-to-All Operations , 2006, 2006 IEEE International Conference on Cluster Computing.
[37] Laxmikant V. Kalé,et al. A framework for collective personalized communication , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[38] Liang Yuan,et al. LogGPH: A Parallel Computational Model with Hierarchical Communication Awareness , 2010, 2010 13th IEEE International Conference on Computational Science and Engineering.
[39] Kees Verstoep,et al. Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.
[40] Fumihiko Ino,et al. LogGPS: a parallel computational model for synchronization analysis , 2001, PPoPP '01.
[41] Alexey L. Lastovetsky,et al. Data Partitioning with a Functional Performance Model of Heterogeneous Processors , 2007, Int. J. High Perform. Comput. Appl..
[42] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[43] Kirk W. Cameron,et al. Quantifying locality effect in data access delay: memory logP , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[44] Sang Cheol Kim,et al. Measurement and Prediction of Communication Delays in Myrinet Networks , 2001, J. Parallel Distributed Comput..
[45] Alexey L. Lastovetsky,et al. High Performance Heterogeneous Computing , 2009, Wiley series on parallel and distributed computing.
[46] Jesper Larsson Träff,et al. More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems , 2004, PVM/MPI.
[47] Robert A. van de Geijn,et al. On optimizing collective communication , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[48] Ziming Zhong,et al. FuPerMod: A Framework for Optimal Data Partitioning for Parallel Scientific Applications on Dedicated Heterogeneous HPC Platforms , 2013, PaCT.
[49] Milton Abramowitz,et al. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .
[50] Torsten Hoefler,et al. Netgauge: A Network Performance Measurement Framework , 2007, HPCC.
[51] Brice Goglin,et al. KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework , 2013, J. Parallel Distributed Comput..
[52] Torsten Hoefler,et al. Multistage switches are not crossbars: Effects of static routing in high-performance networks , 2008, 2008 IEEE International Conference on Cluster Computing.
[53] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..