Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications
暂无分享,去创建一个
Hatem Ltaief | David Keyes | Kadir Akbudak | George Bosilca | Qinglei Cao | Jack Dongarra | Yu Pei | Aleksandr Mikhalev | D. Keyes | J. Dongarra | G. Bosilca | H. Ltaief | A. Mikhalev | Qinglei Cao | Yu Pei | Kadir Akbudak
[1] David E. Keyes,et al. Exploiting Data Sparsity for Large-Scale Matrix Computations , 2018, Euro-Par.
[2] David E. Keyes,et al. Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).
[3] E. Tyrtyshnikov. Mosaic-Skeleton approximations , 1996 .
[4] George Bosilca,et al. Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[5] Thomas Hérault,et al. PTG: An Abstraction for Unhindered Parallelism , 2014, 2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.
[6] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] Ying Sun,et al. Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets , 2016 .
[8] Eric Darve,et al. A fast block low-rank dense solver with applications to finite-element matrices , 2014, J. Comput. Phys..
[9] Susan Coghlan,et al. Operating system issues for petascale systems , 2006, OPSR.
[10] Mihai Anitescu,et al. Scalable Gaussian Process Computations Using Hierarchical Matrices , 2018, Journal of Computational and Graphical Statistics.
[11] Thomas Hérault,et al. Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[12] Dan Tsafrir,et al. System noise, OS clock ticks, and fine-grained parallel applications , 2005, ICS '05.
[13] Jianlin Xia,et al. A Superfast Structured Solver for Toeplitz Linear Systems via Randomized Sampling , 2012, SIAM J. Matrix Anal. Appl..
[14] Michael L. Stein,et al. Limitations on low rank approximations for covariance matrices of spatial data , 2014 .
[15] Emmanuel Agullo,et al. Achieving High Performance on Supercomputers with a Sequential Task-based Programming Model , 2017 .
[16] Steffen Börm,et al. Data-sparse Approximation by Adaptive ℋ2-Matrices , 2002, Computing.
[17] Alejandro Duran,et al. A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks , 2009, International Journal of Parallel Programming.
[18] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[19] S. Börm. Efficient Numerical Methods for Non-local Operators , 2010 .
[20] Nathan Halko,et al. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..
[21] George Bosilca,et al. Hierarchical DAG Scheduling for Hybrid Distributed Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[22] Patrick R. Amestoy,et al. Performance and Scalability of the Block Low-Rank Multifrontal Factorization on Multicore Architectures , 2019, ACM Trans. Math. Softw..
[23] Philippe Olivier Alexandre Navaux,et al. Performance Improvement of Stencil Computations for Multi-core Architectures based on Machine Learning , 2017, ICCS.
[24] George Bosilca,et al. PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution , 2015, 2015 IEEE International Conference on Cluster Computing.
[25] Siegfried Benkner,et al. Implementing the Open Community Runtime for Shared-Memory and Distributed-Memory Systems , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).
[26] Jack J. Dongarra,et al. Accelerating NWChem Coupled Cluster through dataflow-based execution , 2018, Int. J. High Perform. Comput. Appl..
[27] Mario Bebendorf,et al. Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems , 2008 .
[28] Robert A. van de Geijn,et al. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.
[29] Ichitaro Yamazaki,et al. Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization , 2019, 2019 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI (PAW-ATM).
[30] Yu Pei,et al. Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools , 2019, 2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools).
[31] G. Peano. Sur une courbe, qui remplit toute une aire plane , 1890 .
[32] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[33] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[34] W. Hackbusch,et al. Hierarchical Matrices: Algorithms and Analysis , 2015 .
[35] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[36] Chenhan D. Yu,et al. Distributed-Memory Hierarchical Compression of Dense SPD Matrices , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[37] George Bosilca,et al. PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability , 2013 .
[38] Pradipta De,et al. Impact of Noise on Scaling of Collectives: An Empirical Evaluation , 2006, HiPC.
[39] Thomas Hérault,et al. Dynamic task discovery in PaRSEC: a data-flow task-based runtime , 2017, ScalA@SC.
[40] Ronald Kriemann,et al. H-LU Factorization on Many-Core Systems , 2014 .
[41] A. Brandt. Multilevel computations of integral transforms and particle interactions with oscillatory kernels , 1991 .
[42] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .
[43] Théo Mary,et al. Block Low-Rank multifrontal solvers: complexity, performance, and scalability. (Solveurs multifrontaux exploitant des blocs de rang faible: complexité, performance et parallélisme) , 2017 .
[44] Thomas Hérault,et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.
[45] R. Parr. Density-functional theory of atoms and molecules , 1989 .
[46] Jean-Yves L'Excellent,et al. Improving Multifrontal Methods by Means of Block Low-Rank Representations , 2015, SIAM J. Sci. Comput..
[47] Patrick Amestoy,et al. MUMPS : A General Purpose Distributed Memory Sparse Solver , 2000, PARA.
[48] Elisabeth Larsson,et al. A task parallel implementation of a scattered node stencil-based solver for the shallow water equations , 2013 .
[49] David E. Keyes,et al. ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.
[50] Andrew Gordon Wilson,et al. Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.
[51] David E. Keyes,et al. Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures , 2017, ISC.
[52] Ronald Kriemann,et al. $${{\fancyscript{H}}} $$H-LU factorization on many-core systems , 2013, Comput. Vis. Sci..
[53] Wolfgang Hackbusch,et al. A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , 1999, Computing.
[54] Eric Darve,et al. An $$\mathcal O (N \log N)$$O(NlogN) Fast Direct Solver for Partial Hierarchically Semi-Separable Matrices , 2013 .
[55] Pieter Ghysels,et al. A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization , 2015, ACM Trans. Math. Softw..
[56] David E. Keyes,et al. Real-Time Massively Distributed Multi-object Adaptive Optics Simulations for the European Extremely Large Telescope , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).