Automatic algorithm derivation and exploration in linear algebra for parallelism and locality
暂无分享,去创建一个
[1] Robert A. van de Geijn,et al. The science of deriving dense linear algebra algorithms , 2005, TOMS.
[2] Saman P. Amarasinghe,et al. Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.
[3] Julien Langou,et al. The Impact of Multicore on Math Software , 2006, PARA.
[4] J. Ramanujam,et al. Tiling multidimensional iteration spaces for nonshared memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[5] Paolo Bientinesi,et al. Knowledge-Based Automatic Generation of Partitioned Matrix Expressions , 2011, CASC.
[6] William Jalby,et al. Loop Optimization using Hierarchical Compilation and Kernel Decomposition , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[7] Isak Jonsson,et al. Recursive blocked algorithms for solving triangular systems—Part I: one-sided and coupled Sylvester-type matrix equations , 2002, TOMS.
[8] Keshav Pingali,et al. Data-Centric Transformations for Locality Enhancement , 2001, International Journal of Parallel Programming.
[9] David A. Padua,et al. A Parallel Numerical Solver Using Hierarchically Tiled Arrays , 2010, LCPC.
[10] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[11] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[12] Xing Zhou,et al. Hierarchical overlapped tiling , 2012, CGO '12.
[13] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[14] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[15] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[16] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[17] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[18] Ahmed H. Sameh,et al. A parallel hybrid banded system solver: the SPIKE algorithm , 2006, Parallel Comput..