Performance Evaluation and Analysis of Linear Algebra Kernels in the Prototype Tianhe-3 Cluster
暂无分享,去创建一个
Depei Qian | Hailong Yang | Zhongzhi Luan | Yi Liu | Xin You | D. Qian | Hailong Yang | Zhongzhi Luan | Yi Liu | Xin You
[1] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[2] Michael Garland,et al. Merge-Based Parallel Sparse Matrix-Vector Multiplication , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[4] Jørgen Fredsøe,et al. A wave generation toolbox for the open‐source CFD library: OpenFoam® , 2012 .
[5] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[6] Jean Luca Bez,et al. Performance and energy efficiency analysis of HPC physics simulation applications in a cluster of ARM processors , 2017, Concurr. Comput. Pract. Exp..
[7] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[8] P. O. A. Navaux,et al. Time-to-Solution and Energy-to-Solution: A Comparison between ARM and Xeon , 2012, 2012 Third Workshop on Applications for Multi-Core Architecture.
[9] Alejandro Rico,et al. Tibidabo: Making the case for an ARM-based HPC system , 2014, Future Gener. Comput. Syst..
[10] Avinash Sodani,et al. Knights landing (KNL): 2nd Generation Intel® Xeon Phi processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[11] Oliver Ray,et al. Automatically Tuning the GCC Compiler to Optimize the Performance of Applications Running on the ARM Cortex-M3 , 2017, ArXiv.
[12] Alex Ramírez,et al. The low power architecture approach towards exascale computing , 2013, J. Comput. Sci..
[13] Jack Dongarra,et al. Report on the TianHe-2A System , 2017 .
[14] Jesús Labarta,et al. The HPCG benchmark: analysis, shared memory preliminary improvements and evaluation on an Arm-based platform , 2018 .
[15] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[16] Charles Zhang. Mars: A 64-core ARMv8 processor , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[17] Amy Nicole Langville,et al. Google's PageRank and beyond - the science of search engine rankings , 2006 .
[18] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.
[19] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[20] Jianbin Fang,et al. Optimizing Sparse Matrix–Vector Multiplications on an ARMv8-based Many-Core Architecture , 2019, International Journal of Parallel Programming.
[21] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[22] Christian F. A. Negre,et al. The basic matrix library (BML) for quantum chemistry , 2018, The Journal of Supercomputing.
[23] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[24] Eduard Ayguadé,et al. The Mont-Blanc Prototype: An Alternative Approach for HPC Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.