An Extended Roofline Model with Communication-Awareness for Distributed-Memory HPC Systems
暂无分享,去创建一个
[1] Andreas Gerstlauer,et al. A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores , 2014, J. Signal Process. Syst..
[2] Rajeev Thakur,et al. Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.
[3] Tomás F. Pena,et al. 3DyRM: a dynamic roofline model including memory latency information , 2014, The Journal of Supercomputing.
[4] Ki-Hwan Kim,et al. Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model , 2011, Comput. Phys. Commun..
[5] Paul Jähne. Erzeugung minimaler Spannbäume auf ungerichteten, kantengewichteten Graphen mit den Algorithmen von Kruskal, Prim und Boruvka , 2015, GI-Jahrestagung.
[6] Georg Ofenbeck,et al. Applying the roofline model , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[7] Ruedi Steinmann. Applying the Rooine Model , 2012 .
[8] Laxmikant V. Kalé,et al. Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar , 2010, Int. J. High Perform. Comput. Appl..
[9] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[10] Richard W. Vuduc,et al. A Roofline Model of Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[11] Frederico Pratas,et al. Cache-aware Roofline model: Upgrading the loft , 2014, IEEE Computer Architecture Letters.
[12] Kees Verstoep,et al. Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.
[13] Guang R. Gao,et al. Extending the Roofline Model for Asynchronous Many-Task Runtimes , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).
[14] Diego Rossinelli,et al. Mesh–particle interpolations on graphics processing units and multicore central processing units , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.
[15] Leonel Sousa,et al. Performance Analysis with Cache-Aware Roofline Model in Intel Advisor , 2017, 2017 International Conference on High Performance Computing & Simulation (HPCS).
[16] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[17] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[18] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[19] Markus Püschel,et al. Extending the roofline model: Bottleneck analysis with microarchitectural constraints , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).
[20] James Demmel,et al. the Parallel Computing Landscape , 2022 .
[21] Jae Wook Jeon,et al. A roofline model based on working set size for embedded systems , 2014, IEICE Electron. Express.
[22] Henk Corporaal,et al. The boat hull model: adapting the roofline model to enable performance prediction for parallel computing , 2012, PPoPP '12.
[23] Fabrice Rossi,et al. Mean Absolute Percentage Error for regression models , 2016, Neurocomputing.
[24] Gerth Stølting Brodal,et al. Cache-Oblivious Algorithms and Data Structures , 2004, SWAT.
[25] Emmanuel Jeannot,et al. Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model , 2017, PMBS@SC.