Redefining the Role of the CPU in the Era of CPU-GPU Integration
暂无分享,去创建一个
Scott B. Baden | Manish Arora | Dean M. Tullsen | Siddhartha Nath | Subhra Mazumdar | D. Tullsen | Manish Arora | S. Baden | S. Nath | Subhra Mazumdar
[1] Timothy G. Mattson,et al. OpenCL Programming Guide , 2011 .
[2] Mateo Valero,et al. Toward kilo-instruction processors , 2004, TACO.
[3] Lieven Eeckhout,et al. Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.
[4] Matthew D. Sinclair,et al. Porting CMP Benchmarks to GPUs , 2011 .
[5] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.
[6] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[7] M. Vignesh,et al. Scope for performance enhancement of CMU Sphinx by parallelising with OpenCL , 2011 .
[8] Norman P. Jouppi,et al. Core architecture optimization for heterogeneous chip multiprocessors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[9] Arun K. Somani,et al. Unstructured grid applications on GPU: performance analysis and improvement , 2011, GPGPU-4.
[10] Tom R. Halfhill. NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .
[11] Emilio L. Zapata,et al. Simulation of quantum gates on a novel GPU architecture , 2007 .
[12] John Paul Walters,et al. Evaluating the use of GPUs in liver image segmentation and HMMER database searches , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[13] Dirk Grunwald,et al. A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.
[14] Shane Ryoo,et al. Performance insights on executing non-graphics applications on CUDA on the NVIDIA GeForce 8800 GTX , 2007, 2007 IEEE Hot Chips 19 Symposium (HCS).
[15] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[16] Brad Calder,et al. Pointer cache assisted prefetching , 2002, MICRO.
[17] Volodymyr Kindratenko,et al. MILC on GPUs , 2011 .
[18] André Seznec,et al. The L-TAGE Branch Predictor , 2007, J. Instr. Level Parallelism.
[19] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[20] Kevin Skadron,et al. Parallelization of particle filter algorithms , 2010, ISCA'10.
[21] Tao Tang,et al. Program Optimization of Array-Intensive SPEC2k Benchmarks on Multithreaded GPU Using CUDA and Brook+ , 2009, 2009 15th International Conference on Parallel and Distributed Systems.
[22] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[23] Kevin Skadron,et al. Experiences Accelerating MATLAB Systems Biology Applications , 2009 .