暂无分享,去创建一个
Maarit J. Käpylä | Johannes Pekkilä | Miikka S. Väisälä | Matthias Rheinhardt | Oskar Lappi | M. Rheinhardt | M. Käpylä | Johannes Pekkilä | M. Väisälä | Oskar Lappi
[1] Jens-Michael Wierum,et al. On the Quality of Partitions Based on Space-Filling Curves , 2002, International Conference on Computational Science.
[2] M. Rheinhardt,et al. Interaction of Large- and Small-scale Dynamos in Isotropic Turbulent Flows from GPU-accelerated Simulations , 2020, The Astrophysical Journal.
[3] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[4] J. Tait,et al. Challenges and opportunities. , 1996, Journal of psychiatric and mental health nursing.
[5] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[6] M. Mitchell Waldrop,et al. The chips are down for Moore’s law , 2016, Nature.
[7] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[8] K. E. Jordan,et al. Multiphysics simulations: Challenges and opportunities , 2013, Int. J. High Perform. Comput. Appl..
[9] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.
[10] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[11] Axel Brandenburg,et al. Computational aspects of astrophysical MHD and turbulence , 2001, Advances in Nonlinear Dynamos.
[12] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Peter Lindstrom,et al. Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.
[14] Marek Blazewicz,et al. Using GPU's to accelerate stencil-based computation kernels for the development of large scale scientific applications on heterogeneous systems , 2012, PPoPP '12.
[15] J. Williamson. Low-storage Runge-Kutta schemes , 1980 .
[16] John Shalf,et al. The Cactus Framework and Toolkit: Design and Applications , 2002, VECPAR.
[17] Dietmar Fey,et al. LibGeoDecomp: A Grid-Enabled Library for Geometric Decomposition Codes , 2008, PVM/MPI.
[18] Aamir Zia,et al. Mitigating Memory Wall Effects in High-Clock-Rate and Multicore CMOS 3-D Processor Memory Stacks , 2009, Proceedings of the IEEE.
[19] Joo-Young Kim,et al. A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.
[20] Jonathan Ragan-Kelley. Decoupling algorithms from the organization of computation for high performance image processing , 2014 .
[21] Joseph E. Flaherty,et al. Hierarchical Partitioning and Dynamic Load Balancing for Scientific Computation , 2004, PARA.
[22] Dhabaleswar K. Panda,et al. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.
[23] Omer Anjum,et al. Methods for compressible fluid simulation on GPUs using high-order finite differences , 2017, Comput. Phys. Commun..
[24] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[25] P. Ben'itez-Llambay,et al. FARGO3D: A NEW GPU-ORIENTED MHD CODE , 2016, 1602.02359.
[26] Jean Roman,et al. SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.
[27] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[28] Satoshi Matsuoka,et al. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[29] Berk Hess,et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers , 2015 .
[30] Qi Li,et al. Silicon Photonics for Exascale Systems , 2014, Journal of Lightwave Technology.
[31] Carole-Jean Wu,et al. MCM-GPU: Multi-chip-module GPUs for continued performance scalability , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[32] Laxmikant V. Kalé,et al. Periodic hierarchical load balancing for large supercomputers , 2011, Int. J. High Perform. Comput. Appl..
[33] John Shalf,et al. The future of computing beyond Moore’s Law , 2020, Philosophical Transactions of the Royal Society A.
[34] Freddie D. Witherden,et al. PyFR: An open source framework for solving advection-diffusion type problems on streaming architectures using the flux reconstruction approach , 2013, Comput. Phys. Commun..
[35] David A. Patterson,et al. Latency lags bandwith , 2004, CACM.
[36] Frank H. Shu,et al. The physics of astrophysics. , 1992 .
[37] James Demmel,et al. the Parallel Computing Landscape , 2022 .
[38] Rolf Niedermeier,et al. Towards optimal locality in mesh-indexings , 1997, Discret. Appl. Math..
[39] David Kaeli,et al. Exploiting Adaptive Data Compression to Improve Performance and Energy-Efficiency of Compute Workloads in Multi-GPU Systems , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).