暂无分享,去创建一个
David E. Keyes | Alexey L. Lastovetsky | Vladimir Rychkov | Amani AlOnazi | D. Keyes | Amani Alonazi | V. Rychkov | David E. Keyes
[1] Alexey L. Lastovetsky,et al. Accurate Heterogeneous Communication Models and a Software Tool for Their Efficient Estimation , 2010, Int. J. High Perform. Comput. Appl..
[2] Alexey L. Lastovetsky,et al. HeteroMPI: Towards a message-passing library for heterogeneous networks of computers , 2006, J. Parallel Distributed Comput..
[3] Enhua Wu,et al. Real-time 3D fluid simulation on GPU with complex obstacles , 2004, 12th Pacific Conference on Computer Graphics and Applications, 2004. PG 2004. Proceedings..
[4] Alexey L. Lastovetsky,et al. Using Multidimensional Solvers for Optimal Data Partitioning on Dedicated Heterogeneous HPC Platforms , 2011, PaCT.
[5] Jens Jägersküpper,et al. A Novel Shared-Memory Thread-Pool Implementation for Hybrid Parallel CFD Solvers , 2011, Euro-Par.
[6] Alexey L. Lastovetsky,et al. Dynamic Load Balancing of Parallel Computational Iterative Routines on Platforms with Memory Heterogeneity , 2010, Euro-Par Workshops.
[7] Ziming Zhong,et al. FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms , 2014, The Journal of Supercomputing.
[8] Luke N. Olson,et al. Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods , 2012, SIAM J. Sci. Comput..
[9] Alexey L. Lastovetsky,et al. Data partitioning for multiprocessors with memory heterogeneity and memory constraints , 2005, Sci. Program..
[10] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[11] Emil M. Constantinescu,et al. Multiphysics simulations , 2013, HiPC 2013.
[12] Alexey L. Lastovetsky,et al. High Performance Heterogeneous Computing , 2009, Wiley series on parallel and distributed computing.
[13] Emmanuel Jeannot,et al. Design and Optimization of Scientific Applications for Highly Heterogeneous and Hierarchical HPC Platforms Using Functional Computation Performance Models , 2014, HiPC 2014.
[14] Jack Dongarra,et al. Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators , 2012 .
[15] William Gropp,et al. Domain decomposition on parallel computers , 1989, IMPACT Comput. Sci. Eng..
[16] Emmanuel Agullo,et al. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[17] A. D. Gosman,et al. The computation of compressible and incompressible recirculating flows by a non-iterative implicit scheme , 1986 .
[18] Paride Dagnaa,et al. Partnership for Advanced Computing in Europe Evaluation of Multi-threaded OpenFOAM Hybridization for Massively Parallel Architectures , 2013 .
[19] Constantine D. Polychronopoulos,et al. Parallel programming and compilers , 1988 .
[20] Katarzyna Zadarnowska,et al. Complete PISO and SIMPLE solvers on Graphics Processing Units , 2012, ArXiv.
[21] H. T. Kung,et al. Performance Gains in Conjugate Gradient Computation with Linearly Connected GPU Multiprocessors , 2012 .
[22] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[23] François Pellegrini,et al. PT-Scotch: A tool for efficient parallel graph ordering , 2008, Parallel Comput..
[24] William Gropp,et al. High-performance parallel implicit CFD , 2001, Parallel Comput..
[25] Farshad Khunjush,et al. Optimization of OpenFOAM's linear solvers on emerging multi-core platforms , 2011, Proceedings of 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.
[26] Alexey L. Lastovetsky. Heterogeneity in parallel and distributed computing , 2013, J. Parallel Distributed Comput..
[27] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[28] Cédric Augonnet,et al. StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines , 2010 .
[29] Alexey L. Lastovetsky,et al. Distributed Data Partitioning for Heterogeneous Processors Based on Partial Estimation of Their Functional Performance Models , 2009, Euro-Par Workshops.
[30] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[31] Wim Vanroose,et al. Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..
[32] Alexey L. Lastovetsky,et al. Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors Based on Functional Performance Models , 2011, Euro-Par Workshops.
[33] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[34] Ziming Zhong,et al. FuPerMod: A Framework for Optimal Data Partitioning for Parallel Scientific Applications on Dedicated Heterogeneous HPC Platforms , 2013, PaCT.
[35] Alexey L. Lastovetsky,et al. Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers , 2001, J. Parallel Distributed Comput..
[36] Ziming Zhong,et al. Data Partitioning on Heterogeneous Multicore Platforms , 2011, 2011 IEEE International Conference on Cluster Computing.
[37] Chao-Tung Yang,et al. Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters , 2011, Comput. Phys. Commun..
[38] Hrvoje Jasak,et al. A tensorial approach to computational continuum mechanics using object-oriented techniques , 1998 .
[39] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[40] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[41] G. R. Mudalige,et al. OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures , 2012, 2012 Innovative Parallel Computing (InPar).
[42] Alexey L. Lastovetsky,et al. Data Partitioning with a Functional Performance Model of Heterogeneous Processors , 2007, Int. J. High Perform. Comput. Appl..
[43] Alexey Lastovetsky,et al. Towards a Realistic Performance Model for Networks of Heterogeneous Computers , 2005 .
[44] Joel H. Ferziger,et al. Computational methods for fluid dynamics , 1996 .
[45] Brett A. Becker,et al. Partitioning for Parallel Matrix-Matrix Multiplication with Heterogeneous Processors: The Optimal Solution , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[46] Alexey L. Lastovetsky,et al. Design and implementation of self-adaptable parallel algorithms for scientific computing on highly heterogeneous HPC platforms , 2011, ArXiv.
[47] Yuan Liu. Hybrid Parallel Computation of OpenFOAM Solver on Multi-Core Cluster Systems , 2011 .
[48] Alexey L. Lastovetsky,et al. Building the functional performance model of a processor , 2006, SAC.
[49] Naga K. Govindaraju,et al. GPGPU: general-purpose computation on graphics hardware , 2006, SC.
[50] J. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .
[51] Satoshi Matsuoka,et al. High performance conjugate gradient solver on multi-GPU clusters using hypergraph partitioning , 2010, Computer Science - Research and Development.
[52] James R. Stewart,et al. A framework approach for developing parallel adaptive multiphysics applications , 2004 .
[53] Wolfgang Straßer,et al. A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[54] Hrvoje Jasak,et al. Development of a Generalized Grid Mesh Interface for Turbomachinery simulations with OpenFOAM , 2008 .
[55] Ziming Zhong,et al. Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications , 2012, 2012 IEEE International Conference on Cluster Computing.
[56] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[57] Ian Buck,et al. GPU computing with NVIDIA CUDA , 2007, SIGGRAPH Courses.
[58] Alexey L. Lastovetsky,et al. Heterogeneous Distribution of Computations While Solving Linear Algebra Problems on Networks of Heterogeneous Computers , 1999, HPCN Europe.
[59] Alexey Lastovetsky. Parallel Simulation of Oil Extraction on Heterogeneous Networks of Computers , 2012 .
[60] Manolis Papadrakakis,et al. A new era in scientific computing: Domain decomposition methods in hybrid CPU-GPU architectures , 2011 .
[61] Anthony T. Chronopoulos,et al. s-step iterative methods for symmetric linear systems , 1989 .
[62] Rajesh Bordawekar,et al. Optimizing Sparse Matrix-Vector Multiplication on GPUs using Compile-time and Run-time Strategies , 2008 .
[63] F. Harlow,et al. Numerical Calculation of Time‐Dependent Viscous Incompressible Flow of Fluid with Free Surface , 1965 .
[64] Guillaume Mercier,et al. hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[65] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..
[66] Robert M. Farber,et al. CUDA Application Design and Development , 2011 .
[67] Alexey L. Lastovetsky,et al. Data partitioning with a realistic performance model of networks of heterogeneous computers with task size limits , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.
[68] Alexey L. Lastovetsky,et al. HeteroMPI+ScaLAPACK: Towards a ScaLAPACK (Dense Linear Solvers) on Heterogeneous Networks of Computers , 2006, HiPC.
[69] Anoop Gupta,et al. Parallel computer architecture - a hardware / software approach , 1998 .
[70] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[71] Alexey L. Lastovetsky,et al. Two-Dimensional Matrix Partitioning for Parallel Computing on Heterogeneous Processors Based on Their Functional Performance Models , 2009, Euro-Par Workshops.
[72] Leonel Sousa,et al. Hierarchical Partitioning Algorithm for Scientific Computing on Highly Heterogeneous CPU + GPU Clusters , 2012, Euro-Par.
[73] Toni Cortes,et al. PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .
[74] Hans Werner Meuer,et al. Top500 Supercomputer Sites , 1997 .
[75] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[76] Michael Garland,et al. Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .
[77] Alexey L. Lastovetsky,et al. Data distribution for dense factorization on computers with memory heterogeneity , 2007, Parallel Comput..
[78] Jack J. Dongarra,et al. A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators , 2010, VECPAR.
[79] Guillaume Caumon,et al. Concurrent number cruncher: a GPU implementation of a general sparse linear solver , 2009, Int. J. Parallel Emergent Distributed Syst..
[80] Alexey L. Lastovetsky,et al. Dynamic Load Balancing of Parallel Computational Iterative Routines on Highly Heterogeneous HPC Platforms , 2011, Parallel Process. Lett..