Toward multi-target autotuning for accelerators
暂无分享,去创建一个
[1] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[2] Allen D. Malony,et al. Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs , 2011, 2011 International Conference on Parallel Processing.
[3] Allen D. Malony,et al. Design and implementation of a parallel performance data management framework , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[4] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[5] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[6] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[7] Albert Cohen,et al. PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests , 2009 .
[8] Prasanna Balaprakash,et al. An Experimental Study of Global and Local Search Algorithms in Empirical Performance Tuning , 2012, VECPAR.
[9] D. Keyes,et al. Jacobian-free Newton-Krylov methods: a survey of approaches and applications , 2004 .
[10] William Gropp,et al. Annotations for Productivity and Performance Portability , 2007 .
[11] Allen D. Malony,et al. The TAU Parallel Performance System 2 Corresponding Author : , 2005 .
[12] Elizabeth R. Jessup,et al. Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes , 2009, ICCS.
[13] Allen D. Malony,et al. Tools for machine-learning-based empirical autotuning and specialization , 2013, Int. J. High Perform. Comput. Appl..
[14] Boyana Norris,et al. Autotuning Stencil-Based Computations on GPUs , 2012, 2012 IEEE International Conference on Cluster Computing.
[15] Allen D. Malony,et al. ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis , 2003, Euro-Par.
[16] Allen D. Malony,et al. Knowledge support and automation for performance analysis with PerfExplorer 2.0 , 2008, Sci. Program..
[17] Karl Rupp,et al. An automatic OpenCL compute kernel generator for basic linear algebra operations , 2012, HiPC 2012.
[18] Allen D. Malony,et al. PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[19] P. Sadayappan,et al. Stencil-Aware GPU Optimization of Iterative Solvers , 2013, SIAM J. Sci. Comput..
[20] William Gropp,et al. Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.