论文信息 - Automatic algorithm derivation and exploration in linear algebra for parallelism and locality - 字舞流文

Automatic algorithm derivation and exploration in linear algebra for parallelism and locality

Alexandre Xavier Duchateau

[1] Robert A. van de Geijn,et al. The science of deriving dense linear algebra algorithms , 2005, TOMS.

[2] Saman P. Amarasinghe,et al. Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[3] Julien Langou,et al. The Impact of Multicore on Math Software , 2006, PARA.

[4] J. Ramanujam,et al. Tiling multidimensional iteration spaces for nonshared memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[5] Paolo Bientinesi,et al. Knowledge-Based Automatic Generation of Partitioned Matrix Expressions , 2011, CASC.

[6] William Jalby,et al. Loop Optimization using Hierarchical Compilation and Kernel Decomposition , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[7] Isak Jonsson,et al. Recursive blocked algorithms for solving triangular systems—Part I: one-sided and coupled Sylvester-type matrix equations , 2002, TOMS.

[8] Keshav Pingali,et al. Data-Centric Transformations for Locality Enhancement , 2001, International Journal of Parallel Programming.

[9] David A. Padua,et al. A Parallel Numerical Solver Using Hierarchically Tiled Arrays , 2010, LCPC.

[10] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.

[11] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..

[12] Xing Zhou,et al. Hierarchical overlapped tiling , 2012, CGO '12.

[13] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[14] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[16] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[17] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.

[18] Ahmed H. Sameh,et al. A parallel hybrid banded system solver: the SPIKE algorithm , 2006, Parallel Comput..