A Survey of Communication Performance Models for High-Performance Computing
暂无分享,去创建一个
Alexey L. Lastovetsky | Juan Carlos Díaz Martín | Ravi Reddy | Juan A. Rico-Gallego | Juan Carlos Díaz Martín | Ravi Reddy
[1] Jean-François Méhaut,et al. A Contention-Aware Performance Model for HPC-Based Networks: A Case Study of the InfiniBand Network , 2011, Euro-Par.
[2] Duncan A. Grove,et al. Precise MPI Performance Measurement Using MPIBench , 2001 .
[3] Alexey L. Lastovetsky,et al. Accurate Heterogeneous Communication Models and a Software Tool for Their Efficient Estimation , 2010, Int. J. High Perform. Comput. Appl..
[4] Torsten Hoefler,et al. LogfP - a model for small messages in InfiniBand , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[5] Viktor K. Prasanna,et al. Efficient collective communication in distributed heterogeneous systems , 2003, J. Parallel Distributed Comput..
[6] Alexey L. Lastovetsky,et al. Building the communication performance model of heterogeneous clusters based on a switched network , 2007, 2007 IEEE International Conference on Cluster Computing.
[7] Jesper Larsson Träff,et al. More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems , 2004, PVM/MPI.
[8] Lionel M. Ni,et al. Construction of optimal multicast trees based on the parameterized communication model , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[9] Henri Casanova,et al. Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..
[10] Luiz Angelo Steffenel,et al. Fast Tuning of Intra-cluster Collective Communications , 2004, PVM/MPI.
[11] Torsten Hoefler,et al. A practical approach to the rating of barrier algorithms using the LogP model and Open MPI , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).
[12] Dhabaleswar K. Panda,et al. Efficient collective communication on heterogeneous networks of workstations , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).
[13] Roger W. Hockney,et al. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.
[14] Jack J. Dongarra,et al. Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[15] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[16] D.E. Culler,et al. Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[17] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[18] Kees Verstoep,et al. Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.
[19] Csaba Andras Moritz,et al. LoGPC: Modeling Network Contention in Message-Passing Programs , 2001, IEEE Trans. Parallel Distributed Syst..
[20] Sang Cheol Kim,et al. Measurement and Prediction of Communication Delays in Myrinet Networks , 2001, J. Parallel Distributed Comput..
[21] Fumihiko Ino,et al. LogGPS: a parallel computational model for synchronization analysis , 2001, PPoPP '01.
[22] Sascha Hunold,et al. MPI Benchmarking Revisited: Experimental Design and Reproducibility , 2015, ArXiv.
[23] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[24] Luiz Angelo Steffenel,et al. Modeling Network Contention Effects on All-to-All Operations , 2006, 2006 IEEE International Conference on Cluster Computing.
[25] John L. Hennessy,et al. The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors , 1995 .
[26] Cho-Li Wang,et al. Realistic communication model for parallel computing on cluster , 1999, ICWC 99. IEEE Computer Society International Workshop on Cluster Computing.
[27] Bruce M. Maggs,et al. Proceedings of the 28th Annual Hawaii International Conference on System Sciences- 1995 Models of Parallel Computation: A Survey and Synthesis , 2022 .
[28] Teck Chaw Ling,et al. Performance modeling for hierarchical graph partitioning in heterogeneous multi-core environment , 2015, Parallel Comput..
[29] Alexey L. Lastovetsky,et al. Model-Based Optimization of EULAG Kernel on Intel Xeon Phi Through Load Imbalancing , 2017, IEEE Transactions on Parallel and Distributed Systems.
[30] Alexey L. Lastovetsky,et al. Accurate and Efficient Estimation of Parameters of Heterogeneous Communication Performance Models , 2009, Int. J. High Perform. Comput. Appl..
[31] Wahid Nasri,et al. PLP: Towards a realistic and accurate model for communication performances on hierarchical cluster-based systems , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[32] Kees Verstoep,et al. Network performance-aware collective communication for clustered wide-area systems , 2001, Parallel Comput..
[33] Torsten Hoefler,et al. Group Operation Assembly Language - A Flexible Way to Express Collective Communication , 2009, 2009 International Conference on Parallel Processing.
[34] Laxmikant V. Kalé,et al. A framework for collective personalized communication , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[35] Liang Yuan,et al. LogGPH: A Parallel Computational Model with Hierarchical Communication Awareness , 2010, 2010 13th IEEE International Conference on Computational Science and Engineering.
[36] Richard M. Karp,et al. Optimal broadcast and summation in the LogP model , 1993, SPAA '93.
[37] Joseph JáJá,et al. An Introduction to Parallel Algorithms , 1992 .
[38] Richard P. Martin,et al. Assessing Fast Network Interfaces , 1996, IEEE Micro.
[39] Franck Cappello,et al. HiHCoHP-Toward a realistic communication model for hierarchical hyperclusters of heterogeneous processors , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[40] Robert A. van de Geijn,et al. SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .
[41] Csaba Andras Moritz,et al. LoGPC: modeling network contention in message-passing programs , 1998, SIGMETRICS '98/PERFORMANCE '98.
[42] Jesper Larsson Träff,et al. An Optimal Broadcast Algorithm Adapted to SMP Clusters , 2005, PVM/MPI.
[43] Rong Ge,et al. $\log_{\rm n}{\rm P}$ and $\log_{3}{\rm P}$: Accurate Analytical Models of Point-to-Point Communication in Distributed Systems , 2007, IEEE Transactions on Computers.
[44] Susumu Shibusawa,et al. Scheduling algorithms for efficient gather operations in distributed heterogeneous systems , 2000, Proceedings 2000. International Workshop on Parallel Processing.
[45] Alexey L. Lastovetsky,et al. Model-based optimization of MPDATA on Intel Xeon Phi through load imbalancing , 2015, ArXiv.
[46] Alexey L. Lastovetsky,et al. An accurate communication model of a heterogeneous cluster based on a switch-enabled Ethernet network , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).
[47] Franck Cappello,et al. An algorithmic model for heterogeneous hyper-clusters: rationale and experience , 2005, Int. J. Found. Comput. Sci..
[48] Xiaofang Zhao,et al. Performance analysis and optimization of MPI collective operations on multi-core clusters , 2009, The Journal of Supercomputing.
[49] Alexey L. Lastovetsky,et al. Model-Based Estimation of the Communication Cost of Hybrid Data-Parallel Applications on Heterogeneous Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.
[50] K. Cameron,et al. lognP and log3P: Accurate Analytical Models of Point-to- point Communication in Distributed Systems , 2006 .
[51] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[52] José Luis Bosque,et al. HLogGP: a new parallel computational model for heterogeneous clusters , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..
[53] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.
[54] Kirk W. Cameron,et al. Quantifying locality effect in data access delay: memory logP , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[55] Alexey N. Salnikov,et al. The Analysis of Cluster Interconnect with the Network_Tests2 Toolkit , 2011, EuroMPI.
[56] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience: Research Articles , 2007 .
[57] John Shalf,et al. The International Exascale Software Project roadmap , 2011, Int. J. High Perform. Comput. Appl..
[58] Alexey L. Lastovetsky,et al. Adaptive parallel computing on heterogeneous networks with mpC , 2002, Parallel Comput..
[59] Nor Asilah Wati Abdul Hamid,et al. Comparison of MPI Benchmark Programs on Shared Memory and Distributed Memory Machines (Point-to-Point Communication) , 2010, Int. J. High Perform. Comput. Appl..
[60] Rolf Riesen,et al. Communication Models for Resource Constrained Hierarchical Ethernet Networks , 2013, Euro-Par Workshops.
[61] Jin Zhang,et al. LogGPO: An accurate communication model for performance prediction of MPI programs , 2009, Science in China Series F: Information Sciences.
[62] Rong Ge,et al. Predicting and Evaluating Distributed Communication Performance , 2004, Proceedings of the ACM/IEEE SC2004 Conference.
[63] Alexey L. Lastovetsky,et al. Hierarchical redesign of classic MPI reduction algorithms , 2016, The Journal of Supercomputing.
[64] Juan Carlos Díaz Martín,et al. τ-Lop: Modeling performance of shared memory MPI , 2015, Parallel Comput..
[65] Alexey L. Lastovetsky,et al. Revisiting communication performance models for computational clusters , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[66] Massimo Bernaschi,et al. Collective communication operations: experimental results vs. theory , 1998 .
[67] Torsten Hoefler,et al. Multistage switches are not crossbars: Effects of static routing in high-performance networks , 2008, 2008 IEEE International Conference on Cluster Computing.
[68] Mary K. Vernon,et al. LoPC: modeling contention in parallel algorithms , 1997, PPOPP '97.
[69] Kuo-Chan Huang,et al. An Improved Model for Predicting HPL Performance , 2007, GPC.
[70] Luiz Angelo Steffenel,et al. Total Exchange Performance Modelling Under Network Contention , 2005, PPAM.
[71] Jeff Rothenberg,et al. The nature of modeling , 1989 .
[72] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[73] Jean-François Méhaut,et al. Prediction of Communication Latency over Complex Network Behaviors on SMP Clusters , 2005, EPEW/WS-FM.
[74] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[75] Bowen Alpern,et al. A model for hierarchical memory , 1987, STOC.
[76] Andrea C. Arpaci-Dusseau,et al. Fast Parallel Sorting Under LogP: Experience with the CM-5 , 1996, IEEE Trans. Parallel Distributed Syst..
[77] Cho-Li Wang,et al. Contention-Aware Communication Schedule for High-Speed Communication , 2003, Cluster Computing.
[78] Torsten Hoefler,et al. Netgauge: A Network Performance Measurement Framework , 2007, HPCC.
[79] A. Lumsdaine,et al. LogGOPSim: simulating large-scale applications in the LogGOPS model , 2010, HPDC '10.
[80] Jesper Larsson Träff,et al. SKaMPI: a comprehensive benchmark for public benchmarking of MPI , 2002, Sci. Program..
[81] Mario Lauria,et al. LogP performance characterization of fast messages atop Myrinet , 1998, Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -.
[82] Torsten Hoefler,et al. A Communication Model for Small Messages with InfiniBand , 2005 .
[83] Alexey L. Lastovetsky,et al. Topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms , 2015, Simul. Model. Pract. Theory.
[84] Massimo Bernaschi,et al. Collective communication operations: experimental results vs. theory , 1998, Concurr. Pract. Exp..
[85] Sascha Hunold,et al. Reproducible MPI Benchmarking is Still Not as Easy as You Think , 2016, IEEE Transactions on Parallel and Distributed Systems.
[86] Eunice E. Santos,et al. Optimal and Near-Optimal Algorithms for k-Item Broadcast , 1999, J. Parallel Distributed Comput..
[87] 坂本 文人,et al. Argonne National Laboratory 滞在記 , 2005 .
[88] Jean-Marc Vincent,et al. Predictive models for bandwidth sharing in high performance clusters , 2008, 2008 IEEE International Conference on Cluster Computing.
[89] Ramesh Subramonian,et al. LogP: a practical model of parallel computation , 1996, CACM.
[90] Alexey L. Lastovetsky,et al. Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms , 2015, The Journal of Supercomputing.
[91] Torsten Hoefler,et al. LogGP in theory and practice - An in-depth analysis of modern interconnection networks and benchmarking methods for collective operations , 2009, Simul. Model. Pract. Theory.
[92] Dhabaleswar K. Panda,et al. Communication modeling of heterogeneous networks of workstations for performance characterization of collective operations , 1999, Proceedings. Eighth Heterogeneous Computing Workshop (HCW'99).
[93] Dave Turner,et al. Protocol-dependent message-passing performance on Linux clusters , 2002, Proceedings. IEEE International Conference on Cluster Computing.
[94] Peng-Jun Wan,et al. A Parallel Computational Model for Heterogeneous Clusters , 2006 .
[95] Alexey L. Lastovetsky,et al. New Model-Based Methods and Algorithms for Performance and Energy Optimization of Data Parallel Applications on Homogeneous Multicore Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.
[96] Luis Pastor,et al. A Parallel Computational Model for Heterogeneous Clusters , 2006, IEEE Transactions on Parallel and Distributed Systems.
[97] Fukuhito Ooshita,et al. Efficient gather operation in heterogeneous cluster systems , 2002, Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications.
[98] Amotz Bar-Noy,et al. Designing broadcasting algorithms in the postal model for message-passing systems , 1992, SPAA '92.
[99] Alexey L. Lastovetsky,et al. Extending τ-Lop to model concurrent MPI communications in multicore clusters , 2016, Future Gener. Comput. Syst..
[100] Michael Anthony Bauer,et al. Hpcbench - a Linux-based network benchmark for high performance networks , 2005, 19th International Symposium on High Performance Computing Systems and Applications (HPCS'05).
[101] Viktor K. Prasanna,et al. Adaptive communication algorithms for distributed heterogeneous systems , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).