Memory access patterns: the missing piece of the multi-GPU puzzle
暂无分享,去创建一个
Amnon Barak | Tal Ben-Nun | Ely Levy | Eri Rubin | A. Barak | Ely Levy | Tal Ben-Nun | Erik Rubin
[1] Martin Lilleeng Sætra,et al. Shallow Water Simulations on Multiple GPUs , 2010, PARA.
[2] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[3] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[4] Stefan Marr,et al. Partitioned Global Address Space Languages , 2015, ACM Comput. Surv..
[5] Corporate Rice University,et al. High performance Fortran language specification , 1993, FORF.
[6] Master Gardener,et al. Mathematical games: the fantastic combinations of john conway's new solitaire game "life , 1970 .
[7] D Bonachea,et al. UPC Language and Library Specifications, Version 1.3 , 2013 .
[8] Inanc Senocak,et al. CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows , 2009 .
[9] MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction , 2014 .
[10] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[11] Lawrence Snyder,et al. A programmer's guide to ZPL , 1999 .
[12] Martin Uecker,et al. A Multi-GPU Programming Library for Real-Time Applications , 2012, ICA3PP.
[13] Francisco Tirado,et al. NMF-mGPU: non-negative matrix factorization on multi-GPU systems , 2015, BMC Bioinformatics.
[14] Wolfgang Straßer,et al. A Parallel Preconditioned Conjugate Gradient Solver for the Poisson Problem on a Multi-GPU Platform , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.
[15] Jungwon Kim,et al. Achieving a single compute device image in OpenCL for multiple GPUs , 2011, PPoPP '11.
[16] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[17] Pablo Tamayo,et al. Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.
[18] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[19] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[20] Uday Bondhugula,et al. Automatic data allocation and buffer management for multi-GPU machines , 2013, TACO.
[21] Samy Bengio,et al. Torch: a modular machine learning software library , 2002 .
[22] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..
[23] Sergei Gorlatch,et al. Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[24] Giovanni Gallo,et al. Advances in Multi-GPU Smoothed Particle Hydrodynamics Simulations , 2014, IEEE Transactions on Parallel and Distributed Systems.
[25] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[26] Christoph W. Kessler,et al. SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.
[27] Ioannis E. Venetis,et al. High performance MRI simulations of motion on multi-GPU systems , 2014, Journal of Cardiovascular Magnetic Resonance.