Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations
暂无分享,去创建一个
Torsten Hoefler | Felix Wolf | Sergei Shudler | Yannick Berens | Alexandru Calotoiu | Alexandre Strube | T. Hoefler | F. Wolf | A. Calotoiu | Sergei Shudler | A. Strube | Yannick Berens
[1] Laxmikant V. Kalé,et al. Highly scalable parallel sorting , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[2] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[3] Jeffrey S. Vetter,et al. Asserting Performance Expectations , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[4] Jiri Kraus,et al. GPUMAFIA: Efficient Subspace Clustering with MAFIA on GPUs , 2013, Euro-Par.
[5] Jesper Larsson Träff,et al. SKaMPI: a comprehensive benchmark for public benchmarking of MPI , 2002, Sci. Program..
[6] Torsten Hoefler,et al. Fast Multi-parameter Performance Modeling , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).
[7] Jesper Larsson Träff,et al. Self-Consistent MPI Performance Guidelines , 2010, IEEE Transactions on Parallel and Distributed Systems.
[8] Torsten Hoefler,et al. Accurately measuring collective operations at massive scale , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[9] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .
[10] Amith R. Mamidala,et al. PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[11] Dirk Schmidl,et al. Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.
[12] Richard F. Gunst,et al. Applied Regression Analysis , 1999, Technometrics.
[13] Christof Vömel,et al. ScaLAPACK's MRRR algorithm , 2010, TOMS.
[14] Debra Hensgen,et al. Two algorithms for barrier synchronization , 1988, International Journal of Parallel Programming.
[15] Adolfy Hoisie,et al. Palm: easing the burden of analytical performance modeling , 2014, ICS '14.
[16] Bernd Mohr,et al. The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..
[17] Xin Zhao,et al. Scalable Memory Use in MPI: A Case Study with MPICH2 , 2011, EuroMPI.
[18] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[19] Christian H. Bischof,et al. How Many Threads will be too Many? On the Scalability of OpenMP Implementations , 2015, Euro-Par.
[20] Torsten Hoefler,et al. Using automated performance modeling to find scalability bugs in complex codes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[21] Torsten Hoefler,et al. Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[22] Susan Coghlan,et al. The Influence of Operating Systems on the Performance of Collective Operations at Extreme Scale , 2006, 2006 IEEE International Conference on Cluster Computing.
[23] Yannick Berens. Scalability Validation of Parallel Sorting Algorithms , 2017 .
[24] Torsten Hoefler,et al. Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[25] Torsten Hoefler,et al. Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications , 2017, PPoPP.
[26] Sascha Hunold,et al. Automatic Verification of Self-consistent MPI Performance Guidelines , 2016, Euro-Par.
[27] Torsten Hoefler,et al. Mpi on Millions of Cores * , 2022 .
[28] Torsten Hoefler,et al. The impact of network noise at large-scale communication performance , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[29] Guy E. Blelloch,et al. A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.
[30] Sascha Hunold,et al. MPI Benchmarking Revisited: Experimental Design and Reproducibility , 2015, ArXiv.
[31] Katherine E. Isaacs,et al. There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[32] Torsten Hoefler,et al. Exascaling Your Library: Will Your Implementation Meet Your Expectations? , 2015, ICS.
[33] Peter Sanders,et al. Robust Massively Parallel Sorting , 2016, ALENEX.
[34] Sascha Hunold,et al. Reproducible MPI Benchmarking is Still Not as Easy as You Think , 2016, IEEE Transactions on Parallel and Distributed Systems.
[35] Peter Sanders. Algorithm Engineering - An Attempt at a Definition , 2009, Efficient Algorithms.
[36] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[37] Jesper Larsson Träff,et al. mpicroscope: Towards an MPI Benchmark Tool for Performance Guideline Verification , 2012, EuroMPI.
[38] Felix Wolf,et al. Parallel Sorting with Minimal Data , 2011, EuroMPI.
[39] Torsten Hoefler,et al. Generic topology mapping strategies for large-scale parallel architectures , 2011, ICS '11.
[40] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[41] Peter J. Rousseeuw,et al. The Remedian: A Robust Averaging Method for Large Data Sets , 1990 .
[42] Scott B. Baden,et al. Modeling and predicting performance of high performance computing applications on hardware accelerators , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[43] Felix Wolf,et al. A Scalable Parallel Sorting Algorithm Using Exact Splitting , 2010 .
[44] Philip Heidelberger,et al. The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[45] Gabriel Wittum,et al. 10, 000 Performance Models per Minute - Scalability of the UG4 Simulation Framework , 2015, Euro-Par.
[46] W. Hays. Applied Regression Analysis. 2nd ed. , 1981 .
[47] Felix Wolf,et al. Off-Road Performance Modeling - How to Deal with Segmented Data , 2017, Euro-Par.