Layout-oblivious compiler optimization for matrix computations
暂无分享,去创建一个
Qing Yi | Xiaobing Feng | Jingling Xue | Huimin Cui | Xiaobing Feng | Jingling Xue | Qing Yi | Huimin Cui
[1] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[2] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Yang Yang,et al. Automatic Library Generation for BLAS3 on GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[4] Xiaoning Ding,et al. ULCC: a user-level facility for optimizing shared cache performance on multicores , 2011, PPoPP '11.
[5] Corina S. Pasareanu,et al. A survey of new trends in symbolic execution for software testing and analysis , 2009, International Journal on Software Tools for Technology Transfer.
[6] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[7] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[8] Kleanthis Psarris,et al. Enhancing the Role of Inlining in Effective Interprocedural Parallelization , 2011, 2011 International Conference on Parallel Processing.
[9] Minyi Guo,et al. Enabling loop fusion and tiling for cache performance by fixing fusion-preventing data dependences , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[10] Dongrui Fan,et al. Extendable pattern-oriented optimization directives , 2012, International Symposium on Code Generation and Optimization (CGO 2011).
[11] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[12] Kurt S. Riedel,et al. Banded matrix fraction representation of triangular input normal pairs , 2001, IEEE Trans. Autom. Control..
[13] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[14] Ramesh C. Agarwal,et al. A Scalable Parallel Block Algorithm for Band Cholesky Factorization , 1995, PPSC.
[15] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[16] Chun Chen,et al. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.
[17] Lori A. Clarke,et al. A program testing system , 1976, ACM '76.
[18] Rudolf Eigenmann,et al. Idiom recognition in the Polaris parallelizing compiler , 1995, ICS '95.
[19] William M. Pottenger. Induction Variable Substitution And Reduction Recognition In The Polaris Parallelizing Compiler , 1995 .
[20] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .
[21] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[22] Fred G. Gustavson,et al. A recursive formulation of Cholesky factorization of a matrix in packed storage , 2001, TOMS.
[23] Qing Yi,et al. Automated programmable control and parameterization of compiler optimizations , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[24] Thomas E. Cheatham,et al. Symbolic Evaluation and the Analysis of Programs , 1979, IEEE Transactions on Software Engineering.
[25] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[26] Markus Mock,et al. DyC: an expressive annotation-directed dynamic compiler for C , 2000, Theor. Comput. Sci..
[27] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[28] Dongrui Fan,et al. Extendable pattern-oriented optimization directives , 2011, CGO 2011.
[29] Qing Yi,et al. POET: a scripting language for applying parameterized source‐to‐source program transformations , 2012, Softw. Pract. Exp..
[30] David A. Padua,et al. Semantic Inlining - the Compiler Support for Java in Technical Computing , 1999, PPSC.
[31] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[32] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[33] Calvin Lin,et al. Broadway: A Compiler for Exploiting the Domain-Specific Semantics of Software Libraries , 2005, Proceedings of the IEEE.
[34] Guang R. Gao,et al. Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences , 2006, Euro-Par.
[35] Murali Sitaraman,et al. A Data Abstraction Alternative to Data Structure/Algorithm Modularization , 1998, Generic Programming.
[36] Julien Langou,et al. Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion , 2009, TOMS.