Auto-tuned nested parallelism: A way to reduce the execution time of scientific software in NUMA systems
暂无分享,去创建一个
Javier Cuenca | Domingo Giménez | Luis-Pedro García | Jesús Cámara | D. Giménez | Jesús Cámara | Javier Cuenca | Luis-Pedro García
[1] Javier Cuenca,et al. Architecture of an automatically tuned linear algebra library , 2004, Parallel Comput..
[2] Takahiro Katagiri,et al. d-Spline Based Incremental Parameter Estimation in Automatic Performance Tuning , 2006, PARA.
[3] Anthony Skjellum,et al. Driving Issues in Scalable Libraries: Poly-Algorithms, Data Distribution Independence, Redistribution, Local Storage Schemes , 1995, PPSC.
[4] Javier Cuenca,et al. Improving Linear Algebra Computation on NUMA Platforms through Auto-tuned Nested Parallelism , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.
[5] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[6] Javier Cuenca,et al. Towards the design of an automatically tuned linear algebra library , 2002, Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing.
[7] S. Akhter,et al. Multi-core programming , 2006 .
[8] Javier Cuenca,et al. Designing polylibraries to speed up linear algebra computations , 2004, Int. J. High Perform. Comput. Netw..
[9] Daniel Kressner,et al. Block variants of Hammarling's method for solving Lyapunov equations , 2008, TOMS.
[10] Takahiro Katagiri,et al. ABCLib_DRSSED: A parallel eigensolver with an auto-tuning facility , 2006, Parallel Comput..
[11] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[12] Eddy Caron,et al. Parallel Extension of a Dynamic Performance Forecasting Tool , 2001, Scalable Comput. Pract. Exp..
[13] James Demmel,et al. Statistical Models for Automatic Performance Tuning , 2001, International Conference on Computational Science.
[14] Alexey L. Lastovetsky,et al. Building the functional performance model of a processor , 2006, SAC.
[15] Sathish S. Vadhiyar,et al. Numerical Libraries and the Grid , 2001, Int. J. High Perform. Comput. Appl..
[16] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..
[17] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[18] Alexey L. Lastovetsky,et al. HeteroMPI+ScaLAPACK: Towards a ScaLAPACK (Dense Linear Solvers) on Heterogeneous Networks of Computers , 2006, HiPC.
[19] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[20] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[21] Jack Dongarra,et al. ScaLAPACK user's guide , 1997 .
[22] Takahiro Katagiri,et al. FIBER: A Generalized Framework for Auto-tuning Software , 2003, ISHPC.
[23] Javier Cuenca,et al. Processes Distribution of Homogeneous Parallel Linear Algebra Routines on Heterogeneous Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.
[24] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..