Autotuning in High-Performance Computing Applications
暂无分享,去创建一个
Prasanna Balaprakash | Todd Gamblin | Boyana Norris | Richard Vuduc | Jeffrey K. Hollingsworth | Mary W. Hall | Mary Hall | Jack Dongarra | J. Dongarra | B. Norris | R. Vuduc | J. Hollingsworth | Prasanna Balaprakash | T. Gamblin
[1] Richard D. Hornung,et al. The RAJA Portability Layer: Overview and Status , 2014 .
[2] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[3] Matteo Frigo. A Fast Fourier Transform Compiler , 1999, PLDI.
[4] Victor Eijkhout,et al. Proof-Driven Derivation of Krylov Solver Libraries , 2010 .
[5] Jimeng Sun,et al. Model-Driven Sparse CP Decomposition for Higher-Order Tensors , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[6] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[7] P. Sadayappan,et al. Stencil-Aware GPU Optimization of Iterative Solvers , 2013, SIAM J. Sci. Comput..
[8] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[9] Chun Chen,et al. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology , 2010, Software Automatic Tuning, From Concepts to State-of-the-Art Results.
[10] Ananta Tiwari,et al. Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[11] Samuel Williams,et al. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[12] Allen D. Malony,et al. Design and implementation of a parallel performance data management framework , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[13] William J. Dally,et al. A tuning framework for software-managed memory hierarchies , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[14] Michael Garland,et al. Nitro: A Framework for Adaptive Code Variant Tuning , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[15] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[16] James Demmel,et al. Author retrospective for optimizing matrix multiply using PHiPAC: a portable high-performance ANSI C coding methodology , 2014, ICS 25th Anniversary.
[17] Jack J. Dongarra,et al. A proposal for a set of level 3 basic linear algebra subprograms , 1987, SGNM.
[18] David E. Bernholdt,et al. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.
[19] David D. Cox,et al. Machine learning for predictive auto-tuning with boosted regression trees , 2012, 2012 Innovative Parallel Computing (InPar).
[20] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[21] Katherine A. Yelick,et al. Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, SIAM Conference on Parallel Processing for Scientific Computing.
[22] Daniel Sunderland,et al. Kokkos, a Manycore Device Performance Portability Library for C++ HPC Applications , 2014 .
[23] Prasanna Balaprakash,et al. Can search algorithms save large-scale automatic performance tuning? , 2011, ICCS.
[24] Prasanna Balaprakash,et al. Exploiting Performance Portability in Search Algorithms for Autotuning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[25] Martin Schulz,et al. Caliper: Performance Introspection for HPC Software Stacks , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[26] Elizabeth R. Jessup,et al. Lighthouse: an automated solver selection tool , 2015, SE-HPCCSE@SC.
[27] Chun Chen,et al. Model-guided empirical optimization for memory hierarchy , 2007 .
[28] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[29] Hiroaki Kobayashi,et al. Xevolver: An XML-based code translation framework for supporting HPC application migration , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[30] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[31] Vahid Tabatabaee,et al. Parallel Parameter Tuning for Applications with Performance Variability , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[32] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[33] Prasanna Balaprakash,et al. An Experimental Study of Global and Local Search Algorithms in Empirical Performance Tuning , 2012, VECPAR.
[34] Richard W. Vuduc,et al. A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[35] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[36] Prasanna Balaprakash,et al. Generating Efficient Tensor Contractions for GPUs , 2015, 2015 44th International Conference on Parallel Processing.
[37] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[38] Boyana Norris,et al. Autotuning Stencil-Based Computations on GPUs , 2012, 2012 IEEE International Conference on Cluster Computing.
[39] I-Hsin Chung,et al. A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.
[40] Jack J. Dongarra,et al. A comparison of search heuristics for empirical code optimization , 2008, 2008 IEEE International Conference on Cluster Computing.
[41] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.
[42] Elizabeth R. Jessup,et al. Performance-Based Numerical Solver Selection in the Lighthouse Framework , 2016, SIAM J. Sci. Comput..
[43] Alexander Heinecke,et al. LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[44] Michael Voss,et al. High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.
[45] Ignacio Laguna,et al. Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[46] David A. Padua,et al. A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.
[47] Paolo Bientinesi,et al. Application-tailored linear algebra algorithms , 2012, Int. J. High Perform. Comput. Appl..
[48] Kalyan Veeramachaneni,et al. Autotuning algorithmic choice for input sensitivity , 2015, PLDI.
[49] Richard W. Vuduc,et al. POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[50] Bronis R. de Supinski,et al. The Spack package manager: bringing order to HPC software chaos , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[51] William Gropp,et al. Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.
[52] Ken Kennedy,et al. Automatic tuning of whole applications using direct search and a performance-based transformation system , 2006, The Journal of Supercomputing.
[53] Elizabeth R. Jessup,et al. Lighthouse: a taxonomy-based solver selection tool , 2015, SEPS@SPLASH.
[54] E. Im,et al. Optimizing Sparse Matrix Vector Multiplication on SMP , 1999, PPSC.
[55] Jeffrey K. Hollingsworth,et al. Computation-communication overlap and parameter auto-tuning for scalable parallel 3-D FFT , 2016, J. Comput. Sci..
[56] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[57] Tamara G. Kolda,et al. An overview of the Trilinos project , 2005, TOMS.
[58] Chun Chen,et al. Auto-tuning full applications: A case study , 2011, Int. J. High Perform. Comput. Appl..
[59] Chun Chen,et al. Speeding up Nek5000 with autotuning and specialization , 2010, ICS '10.
[60] Barton P. Miller,et al. Dynamic program instrumentation for scalable performance tools , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.
[61] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[62] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[63] Samuel Williams,et al. Optimization of a lattice Boltzmann computation on state-of-the-art multicore platforms , 2009, J. Parallel Distributed Comput..
[64] William Gropp,et al. Annotations for Productivity and Performance Portability , 2007 .
[65] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[66] Khalid Ahmad,et al. Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action , 2016, LCPC.
[67] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[68] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[69] Boyana Norris,et al. Generating Customized Sparse Eigenvalue Solutions with Lighthouse , 2014 .
[70] Jack J. Dongarra,et al. Autotuning GEMM Kernels for the Fermi GPU , 2012, IEEE Transactions on Parallel and Distributed Systems.
[71] Kunle Olukotun,et al. A Heterogeneous Parallel Framework for Domain-Specific Languages , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.