Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models
暂无分享,去创建一个
David E. Bernholdt | Robert J. Harrison | Sriram Krishnamoorthy | Venkatesh Choppella | J. Ramanujam | Gerald Baumgartner | Chi-Chung Lam | P. Sadayappan | So Hirata | Xiaoyang Gao | Daniel Cociorva | Russell M. Pitzer | Alina Bibireata | Sandhya Krishnan | Alexander Sibiryakov | Alexander A. Auer | Marcel Nooijen | Qingda Lu | S. Krishnamoorthy | J. Ramanujam | P. Sadayappan | D. Bernholdt | S. Hirata | R. Harrison | Gerald Baumgartner | Q. Lu | M. Nooijen | D. Cociorva | A. Auer | R. Pitzer | Venkatesh Choppella | X. Gao | A. Sibiryakov | Chi-Chung Lam | Alina Bibireata | S. Krishnan
[1] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[2] Ken Kennedy,et al. Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries , 2001, J. Parallel Distributed Comput..
[3] P. Kollman,et al. Encyclopedia of computational chemistry , 1998 .
[4] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[5] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[6] David E. Bernholdt,et al. Memory-Constrained Data Locality Optimization for Tensor Contractions , 2003, LCPC.
[7] Robert J. Harrison,et al. Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.
[8] V. Sarkar,et al. Collective Loop Fusion for Array Contraction , 1992, LCPC.
[9] David A. Padua,et al. SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.
[10] Tarek S. Abdelrahman,et al. Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..
[11] Chi-Chung Lam,et al. On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution , 1997, Parallel Process. Lett..
[12] PingaliKeshav,et al. A case for source-level transformations in MATLAB , 1999 .
[13] Clemens Grelck,et al. With-Loop Fusion for Data Locality and Parallelism , 2005, IFL.
[14] Keshav Pingali,et al. A case for source-level transformations in MATLAB , 1999, DSL '99.
[15] Keshav Pingali,et al. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests , 2000 .
[16] Wei Li,et al. Compiling for NUMA Parallel Machines , 1993 .
[17] Mark S. Gordon,et al. General atomic and molecular electronic structure system , 1993, J. Comput. Chem..
[18] Alan Edelman,et al. Parallel MATLAB: Doing it Right , 2005, Proceedings of the IEEE.
[19] So Hirata,et al. Third-order Douglas-Kroll relativistic coupled-cluster theory through connected single, double, triple, and quadruple substitutions: applications to diatomic and triatomic hydrides. , 2004, The Journal of chemical physics.
[20] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[21] Larry Carter,et al. Quantifying the Multi-Level Nature of Tiling Interactions , 1997, International Journal of Parallel Programming.
[22] Keshav Pingali,et al. Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests , 2001, International Journal of Parallel Programming.
[23] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[24] Cheng Wang,et al. Locality Enhancement by Array Contraction , 2001, LCPC.
[25] J. Ramanujam,et al. Loop optimization for a class of memory-constrained computations , 2001, ICS '01.
[26] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[27] J. Ramanujam,et al. Memory-Constrained Communication Minimization for a Class of Array Computations , 2002, LCPC.
[28] Æleen Frisch,et al. Exploring chemistry with electronic structure methods , 1996 .
[29] David A. Padua,et al. A MATLAB to Fortran 90 translator and its effectiveness , 1996, ICS '96.
[30] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[31] Chi-Chung Lam,et al. Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines , 1997, PPSC.
[32] Anne Mignotte,et al. Loop alignment for memory accesses optimization , 1999, Proceedings 12th International Symposium on System Synthesis.
[33] Mahmut T. Kandemir,et al. Reducing memory requirements of nested loops for embedded systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[34] Keshav Pingali,et al. An experimental evaluation of tiling and shackling for memory hierarchy management , 1999, ICS '99.
[35] Keshav Pingali,et al. High-level semantic optimization of numerical codes , 1999, ICS '99.
[36] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[37] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[38] Jan M. L. Martin. Benchmark Studies on Small Molecules , 2002 .
[39] Chi-Chung Lam,et al. Performance optimization of a class of loops implementing multidimensional integrals , 1999 .
[40] Kathryn S. McKinley,et al. Loop Fusion for Data Locality and Parallelism , 1996 .
[41] Francky Catthoor,et al. Data dependency size estimation for use in memory optimization , 2003, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[42] S. Hirata. Tensor Contraction Engine: Abstraction and Automated Parallel Implementation of Configuration-Interaction, Coupled-Cluster, and Many-Body Perturbation Theories , 2003 .
[43] David E. Bernholdt,et al. Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms , 2003, HiPC.
[44] Gerald Baumgartner,et al. Memory-Optimal Evaluation of Expression Trees Involving Large Objects , 1999, HiPC.
[45] David E. Bernholdt,et al. Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization , 2001, HiPC.
[46] Victor Eijkhout,et al. Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.
[47] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[48] Mahmut T. Kandemir,et al. Estimating and reducing the memory requirements of signal processing codes for embedded systems , 2006, IEEE Transactions on Signal Processing.
[49] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[50] R. C. Whaley,et al. Automatically Tuned Linear Algebra Software (ATLAS) , 2011, Encyclopedia of Parallel Computing.
[51] Gerald Baumgartner,et al. Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals , 1999, LCPC.
[52] Ken Kennedy,et al. Telescoping Languages: A System for Automatic Generation of Domain Languages , 2005, Proceedings of the IEEE.
[53] M. Head‐Gordon,et al. A fifth-order perturbation comparison of electron correlation theories , 1989 .
[54] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[55] Paul N. Hilfinger,et al. Better Tiling and Array Contraction for Compiling Scientific Programs , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[56] J. Ramanujam,et al. Global communication optimization for tensor contraction expressions under memory constraints , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[57] Francky Catthoor,et al. Custom Memory Management Methodology , 1998, Springer US.
[58] Gustavo E. Scuseria,et al. Achieving Chemical Accuracy with Coupled-Cluster Theory , 1995 .
[59] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[60] Leonidas J. Guibas,et al. Compilation and delayed evaluation in APL , 1978, POPL.
[61] Monica S. Lam,et al. Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.
[62] Chau-Wen Tseng,et al. A Comparison of Compiler Tiling Algorithms , 1999, CC.
[63] Yonghong Song,et al. Compiler algorithms for efficient use of memory systems , 2000 .
[64] David E. Bernholdt,et al. Space-time trade-off optimization for a class of electronic structure calculations , 2002, PLDI '02.
[65] Larry Carter,et al. Schedule-independent storage mapping for loops , 1998, ASPLOS VIII.
[66] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[67] Michael E. Wolf,et al. Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.
[68] Cheng Wang,et al. Data locality enhancement by memory reduction , 2001, ICS '01.
[69] Robert J. Harrison,et al. Shared Memory Programming in Metacomputing Environments: The Global Array Approach , 1997, The Journal of Supercomputing.
[70] David A. Padua,et al. Searching for the Best FFT Formulas with the SPL Compiler , 2000, LCPC.