Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads
暂无分享,去创建一个
Fernando Magno Quintão Pereira | Wagner Meira | Sylvain Collange | Renato Ferreira | Teo Milanez | R. Ferreira | Wagner Meira Jr | Caroline Collange | Teo Milanez
[1] Kevin Skadron,et al. Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.
[2] Philip J. Hatcher,et al. Compiling C* programs for a hypercube multicomputer , 1988, PPoPP 1988.
[3] Fernando Magno Quintão Pereira,et al. Data and Instruction Uniformity in Minimal Multi-threading , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.
[4] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[5] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[6] Frederica Darema,et al. A single-program-multiple-data computational model for EPEX/FORTRAN , 1988, Parallel Comput..
[7] Paola Bonizzoni,et al. An approximation algorithm for the shortest common supersequence problem: an experimental analysis , 2001, SAC.
[8] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[9] Sudhakar Yalamanchili,et al. SIMD re-convergence at thread frontiers , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[10] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[11] Yao Zhang,et al. Dynamic Detection of Uniform and Affine Vectors in GPGPU Computations , 2009, Euro-Par Workshops.
[12] Amirali Baniasadi,et al. Performance in GPU Architectures: Potentials and Distances , 2011 .
[13] Dongrui Fan,et al. Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[14] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[15] José González,et al. Thread fusion , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).