Look before You Leap: Using the Right Hardware Resources to Accelerate Applications
暂无分享,去创建一个
[1] Jack J. Dongarra,et al. Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems , 2012, ICS '12.
[2] K. Balakrishnan,et al. A framework for performance modeling of SWIM , 2012, 2012 Integrated Communications, Navigation and Surveillance Conference.
[3] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[4] Matei Ripeanu,et al. A yoke of oxen and a thousand chickens for heavy lifting graph processing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5] Jie Shen,et al. Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms , 2013, CF '13.
[6] Jie Shen,et al. An application-centric evaluation of OpenCL on multi-core CPUs , 2013, Parallel Comput..
[7] Surendra Byna,et al. Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory , 2010, SPAA '10.
[8] Richard W. Vuduc,et al. Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems , 2009, ICS.
[9] Kim M. Hazelwood,et al. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[10] Jie Shen,et al. Improving performance by matching imbalanced workloads with heterogeneous platforms , 2014, ICS '14.
[11] Jérémie Allard,et al. Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations , 2010, Euro-Par.
[12] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[13] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[14] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[15] Thomas Fahringer,et al. An automatic input-sensitive approach for heterogeneous task partitioning , 2013, ICS '13.
[16] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[17] Jesús Labarta,et al. A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[18] Joseph JáJá,et al. High Performance FFT Based Poisson Solver on a CPU-GPU Heterogeneous Platform , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[19] Michael F. P. O'Boyle,et al. A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.
[20] Margaret Martonosi,et al. Reducing GPU offload latency via fine-grained CPU-GPU synchronization , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).