Towards making autotuning mainstream
暂无分享,去创建一个
Mary W. Hall | Protonu Basu | Malik Murtaza Khan | Anand Venkat | Shreyas Ramalingam | Manu Shantharam | Saurav Muralidharan | Axel Rivera | Suchit Maindola
[1] Albert Cohen,et al. Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.
[2] Richard Johnson,et al. Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization , 2003 .
[3] Chun Chen,et al. A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.
[4] Chun Chen,et al. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.
[5] Una-May O'Reilly,et al. An efficient evolutionary algorithm for solving incrementally structured problems , 2011, GECCO '11.
[6] Geri Georg,et al. Set and Relation Manipulation for the Sparse Polyhedral Framework , 2012, LCPC.
[7] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[8] Chun Chen,et al. Model-guided empirical optimization for memory hierarchy , 2007 .
[9] Michel Lemaître,et al. Branch and Bound Algorithm Selection by Performance Prediction , 1998, AAAI/IAAI.
[10] Henry Kautz,et al. Branch and bound algorithm selection by performance prediction , 2001, Conference on Uncertainty in Artificial Intelligence.
[11] William Jalby,et al. Loop Optimization using Hierarchical Compilation and Kernel Decomposition , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[12] I-Hsin Chung,et al. Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[13] Nancy M. Amato,et al. A framework for adaptive algorithm selection in STAPL , 2005, PPoPP.
[14] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[15] Chun Chen,et al. Speeding up Nek5000 with autotuning and specialization , 2010, ICS '10.
[16] Samuel Williams,et al. Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures , 2008 .
[17] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[18] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[19] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[20] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[21] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[22] Michael Wolfe,et al. Loops skewing: The wavefront method revisited , 1986, International Journal of Parallel Programming.
[23] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[24] Albert Cohen,et al. Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[25] Archana Ganapathi,et al. A case for machine learning to optimize multicore performance , 2009 .
[26] Qing Yi,et al. POET: a scripting language for applying parameterized source‐to‐source program transformations , 2012, Softw. Pract. Exp..
[27] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[28] Michela Milano,et al. Learning Techniques for Automatic Algorithm Portfolio Selection , 2004, ECAI.
[29] Markus Püschel,et al. Computer Generation of General Size Linear Transform Libraries , 2009, 2009 International Symposium on Code Generation and Optimization.
[30] Chun Chen,et al. Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[31] Ken Kennedy,et al. Profitable loop fusion and tiling using model-driven empirical search , 2006, ICS '06.
[32] Larry Carter,et al. Rescheduling for Locality in Sparse Matrix Computations , 2001, International Conference on Computational Science.
[33] Matteo Frigo. A Fast Fourier Transform Compiler , 1999, PLDI.
[34] Uday Bondhugula,et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.
[35] Chun Chen,et al. Auto-tuning full applications: A case study , 2011, Int. J. High Perform. Comput. Appl..
[36] Haipeng Guo. A Bayesian Approach for Automatic Algorithm Selection , 2003 .
[37] Saman P. Amarasinghe,et al. Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.
[38] David A. Padua,et al. Optimizing sorting with genetic algorithms , 2005, International Symposium on Code Generation and Optimization.
[39] Uday Bondhugula,et al. A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.
[40] David Parello,et al. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.
[41] Robert Glück. A self‐applicable online partial evaluator for recursive flowchart languages , 2012, Softw. Pract. Exp..
[42] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.
[43] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[44] Ananta Tiwari,et al. Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[45] Andy Nisbet,et al. GAPS: Iterative Feedback Directed Parallelisation Using Genetic Algorithms , 2000 .
[46] J. Ramanujam,et al. Parameterized tiling revisited , 2010, CGO '10.
[47] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[48] David A. Padua,et al. A dynamically tuned sorting library , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[49] David A. Padua,et al. A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.
[50] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[51] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.
[52] Frank Mueller,et al. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.
[53] D. Merrill,et al. Policy-based tuning for performance portability and library co-optimization , 2012, 2012 Innovative Parallel Computing (InPar).
[54] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[55] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[56] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[57] Lars Kotthoff,et al. A Preliminary Evaluation of Machine Learning in Algorithm Selection for Search Problems , 2011, SOCS.
[58] Paul D. Hovland,et al. Generating Performance Bounds from Source Code , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[59] Yoav Shoham,et al. A portfolio approach to algorithm select , 2003, IJCAI 2003.
[60] Michael Voss,et al. High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.
[61] Richard W. Vuduc,et al. POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[62] Babak Falsafi,et al. Reference idempotency analysis: a framework for optimizing speculative execution , 2001, PPoPP '01.
[63] Michail G. Lagoudakis,et al. Algorithm Selection using Reinforcement Learning , 2000, ICML.
[64] Joel H. Saltz,et al. Programming Irregular Applications: Runtime Support, Compilation and Tools , 1997, Adv. Comput..
[65] Yang Yang,et al. Automatic Library Generation for BLAS3 on GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[66] Keith D. Cooper,et al. Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.
[67] R. C. Whaley,et al. Timing high performance kernels through empirical compilation , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[68] Bart Selman,et al. Algorithm portfolios , 2001, Artif. Intell..
[69] Andrei Alexandrescu,et al. Modern C++ design: generic programming and design patterns applied , 2001 .
[70] Benoît Meister,et al. A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction , 2010, GPGPU-3.
[71] Samuel Williams,et al. Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[72] Ivana Kruijff-Korbayová,et al. A Portfolio Approach to Algorithm Selection , 2003, IJCAI.
[73] Albert Cohen,et al. Predictive modeling in a polyhedral optimization space , 2011, CGO 2011.
[74] A. Nakano,et al. Divide-and-conquer density functional theory on hierarchical real-space grids: Parallel implementation and applications , 2008 .
[75] John R. Rice,et al. The Algorithm Selection Problem , 1976, Adv. Comput..
[76] Ken Kennedy,et al. Model-guided empirical tuning of loop fusion , 2008, Int. J. High Perform. Syst. Archit..
[77] William J. Dally,et al. A tuning framework for software-managed memory hierarchies , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[78] Larry Carter,et al. Compile-time composition of run-time data and iteration reorderings , 2003, PLDI '03.
[79] Boyana Norris,et al. Autotuning Stencil-Based Computations on GPUs , 2012, 2012 IEEE International Conference on Cluster Computing.
[80] Jack J. Dongarra,et al. A comparison of search heuristics for empirical code optimization , 2008, 2008 IEEE International Conference on Cluster Computing.
[81] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.
[82] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.
[83] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[84] James Demmel,et al. Statistical Models for Empirical Search-Based Performance Tuning , 2004, Int. J. High Perform. Comput. Appl..
[85] Nancy M. Amato,et al. STAPL: standard template adaptive parallel library , 2010, SYSTOR '10.
[86] Henri-Pierre Charles,et al. OCEANS: Optimizing Compilers for Embedded Applications , 1998, European Conference on Parallel Processing.
[87] Michael F. P. O'Boyle,et al. Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[88] Stephen F. Smith,et al. Proceedings: The Fourth International Conference on Artificial Intelligence Planning Systems , 1998 .
[89] Jacqueline Chame,et al. A script-based autotuning compiler system to generate high-performance CUDA code , 2013, TACO.
[90] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.