AARTS: low overhead online adaptive auto-tuning
暂无分享,去创建一个
[1] R. C. Whaley,et al. Timing high performance kernels through empirical compilation , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[2] J. Ramanujam,et al. DynTile: Parametric tiled loop generation for parallel execution on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[3] Michael F. P. O'Boyle,et al. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.
[4] Vaidy S. Sunderam,et al. PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..
[5] Richard W. Vuduc,et al. Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization , 2009, LCPC.
[6] Santosh Pande,et al. Input-driven dynamic execution prediction of streaming applications , 2010, PPoPP '10.
[7] Rudolf Eigenmann,et al. Experiences in Using Cetus for Source-to-Source Transformations , 2004, LCPC.
[8] Jeffrey S. Vetter,et al. Autopilot: adaptive control of distributed applications , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).
[9] Michael Voss,et al. High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.
[10] Markus Mock,et al. DyC: an expressive annotation-directed dynamic compiler for C , 2000, Theor. Comput. Sci..
[11] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[12] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[13] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[14] I-Hsin Chung,et al. Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[15] John A. Nelder,et al. A Simplex Method for Function Minimization , 1965, Comput. J..
[16] Tomàs Margalef,et al. MATE: Monitoring, Analysis and Tuning Environment for parallel/distributed applications: Research Articles , 2007 .
[17] Ümit V. Çatalyürek,et al. Optimizing dataflow applications on heterogeneous environments , 2010, Cluster Computing.
[18] Steven G. Johnson,et al. The Fastest Fourier Transform in the West , 1997 .
[19] Chun Chen,et al. Speeding up Nek5000 with autotuning and specialization , 2010, ICS '10.
[20] Mary W. Hall,et al. CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .
[21] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[22] Rudolf Eigenmann,et al. Automatically Tuning Parallel and Parallelized Programs , 2009, LCPC.
[23] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[24] Richard W. Vuduc,et al. Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..