Performance and Power Efficient Massive Parallel Computational Model for HPC Heterogeneous Exascale Systems
暂无分享,去创建一个
[1] Rupak Biswas,et al. High performance computing using MPI and OpenMP on multi-core parallel systems , 2011, Parallel Comput..
[2] Min Zhou. Petascale adaptive computational fluid dynamics , 2009 .
[3] E. Wes Bethel,et al. Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems , 2012, IEEE Transactions on Visualization and Computer Graphics.
[4] Alejandro Duran,et al. The Intel® Many Integrated Core Architecture , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).
[5] Stephen A. Jarvis,et al. Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark , 2011, PERV.
[6] Inanc Senocak,et al. Multi-level parallelism for incompressible flow computations on GPU clusters , 2013, Parallel Comput..
[7] Jack J. Dongarra,et al. Exascale computing and big data , 2015, Commun. ACM.
[8] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..
[9] W. Zhang,et al. Warp-X: A new exascale computing platform for beam–plasma simulations , 2017, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment.
[10] Sunita Chandrasekaran,et al. Implementing the OpenACC Data Model , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[11] Thomas N. Theis,et al. The End of Moore's Law: A New Beginning for Information Technology , 2017, Computing in Science & Engineering.
[12] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[13] Andreas Kolb,et al. Flow Driven GPGPU Programming combining Textual and Graphical Programming , 2016, PMAM@PPoPP.
[14] Geoffrey C. Fox,et al. Parallel Computing Works , 1994 .
[15] Pete Beckman,et al. Argo: An Exascale Operating System and Runtime , 2015 .
[16] Reiji Suda,et al. Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA , 2009, 2009 International Conference on Computational Science and Engineering.
[17] Jack J. Dongarra,et al. Experiences in autotuning matrix multiplication for energy minimization on GPUs , 2015, Concurr. Comput. Pract. Exp..
[18] m. usmanashraf. Hybrid Model Based Testing Tool Architecture for Exascale Computing System , 2015 .
[19] Alex Ramírez,et al. The low-power architecture approach towards exascale computing , 2011, ScalA '11.
[20] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[21] Eric J. Kelmelis,et al. CULA: hybrid GPU accelerated linear algebra routines , 2010, Defense + Commercial Sensing.
[22] Robert Edwards,et al. Lattice QCD Application Development within the US DOE Exascale Computing Project , 2017 .
[23] Francisco de Sande,et al. Optimization strategies in different CUDA architectures using llCoMP , 2012, Microprocess. Microsystems.
[24] Bettina Schnor,et al. A comparison of CUDA and OpenACC: Accelerating the Tsunami Simulation EasyWave , 2014, ARCS Workshops.
[25] K. A. Gallivan,et al. Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..
[26] Satoshi Matsuoka,et al. CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[27] Y. Raghu Reddy,et al. A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence , 2010, Parallel Comput..
[28] Kwan-Liu Ma,et al. In-situ processing and visualization for ultrascale simulations , 2007 .
[29] Jack J. Dongarra,et al. The quest for petascale computing , 2001, Comput. Sci. Eng..
[30] Wolfgang Frings,et al. Measuring power consumption on IBM Blue Gene/P , 2011, Computer Science - Research and Development.
[31] Rajeev Thakur,et al. An implementation and evaluation of the MPI 3.0 one‐sided communication interface , 2016, Concurr. Comput. Pract. Exp..
[32] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[33] Dirk Eddelbuettel,et al. CRAN Task View: High-Performance and Parallel Computing with R , 2020 .
[34] Joseph Y.-T. Leung,et al. Handbook of Scheduling: Algorithms, Models, and Performance Analysis , 2004 .
[35] Bronis R. de Supinski,et al. Early Experiences Porting Three Applications to OpenMP 4.5 , 2016, IWOMP.
[36] Alejandro Rico,et al. Tibidabo: Making the case for an ARM-based HPC system , 2014, Future Gener. Comput. Syst..
[37] Xiaoqian Zhu,et al. Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters , 2012, Comput. Phys. Commun..
[38] Stephen A. Jarvis,et al. Accelerating Hydrocodes with OpenACC, OpenCL and CUDA , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[39] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[40] Miroslav Hajdukovic,et al. MPI-CUDA parallelization of a finite-strip program for geometric nonlinear analysis: A hybrid approach , 2011, Adv. Eng. Softw..
[41] David E. Keyes,et al. KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators , 2014, ACM Trans. Math. Softw..
[42] Amirali Baniasadi,et al. IPMACC: Open Source OpenACC to CUDA/OpenCL Translator , 2014, ArXiv.
[43] Guirong Liu,et al. A face‐based smoothed finite element method (FS‐FEM) for 3D linear and geometrically non‐linear solid mechanics problems using 4‐node tetrahedral elements , 2009 .
[44] Suchuan Dong,et al. Dual-level parallelism for high-order CFD methods , 2004, Parallel Comput..
[45] Sven Karlsson,et al. Towards Unifying OpenMP Under the Task-Parallel Paradigm - Implementation and Performance of the taskloop Construct , 2016, IWOMP.
[46] Jian-Ming Jin,et al. An OpenMP-CUDA Implementation of Multilevel Fast Multipole Algorithm for Electromagnetic Simulation on Multi-GPU Computing Systems , 2013, IEEE Transactions on Antennas and Propagation.
[47] Kengo Nakajima. Three-level hybrid vs. flat MPI on the Earth Simulator: parallel iterative solvers for finite-element method , 2005 .
[48] Franck Cappello,et al. Toward Exascale Resilience , 2009, Int. J. High Perform. Comput. Appl..
[49] Shuangshuang Jin,et al. Thread Group Multithreading: Accelerating the Computation of an Agent-Based Power System Modeling and Simulation Tool -- C GridLAB-D , 2014, 2014 47th Hawaii International Conference on System Sciences.