A distributed OpenCL framework using redundant computation and data replication
暂无分享,去创建一个
[1] Sushil Jajodia,et al. An adaptive data replication algorithm , 1997, TODS.
[2] Sandro Fiore,et al. Towards Exascale Distributed Data Management , 2009, Int. J. High Perform. Comput. Appl..
[3] Jaejin Lee,et al. Hiding relaxed memory consistency with compilers , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).
[4] Laxmikant V. Kalé,et al. Programming heterogeneous clusters with accelerators using object-based programming , 2011, Sci. Program..
[5] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[6] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[7] Tetsu Narumi,et al. DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[8] Marek Olszewski,et al. Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.
[9] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[10] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[11] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[12] Jungwon Kim,et al. OpenCL as a Programming Model for GPU Clusters , 2011, LCPC.
[13] Jack J. Dongarra,et al. Exascale computing and big data , 2015, Commun. ACM.
[14] Emery D. Berger,et al. Dthreads: efficient deterministic multithreading , 2011, SOSP.
[15] Ji Zhang,et al. Optimizing the Java Piped I/O Stream Library for Performance , 2002, LCPC.
[16] Carlos Reaño,et al. CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution , 2012, 2012 19th International Conference on High Performance Computing.
[17] Dan Grossman,et al. CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS 2010.
[18] Jaejin Lee,et al. Performance characterization of the NAS Parallel Benchmarks in OpenCL , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[19] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.
[20] Ümit V. Çatalyürek,et al. Improving performance of adaptive component-based dataflow middleware , 2012, Parallel Comput..
[21] Takashi Nakamura,et al. Hybrid OpenCL: Connecting Different OpenCL Implementations over Network , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.
[22] Jaejin Lee,et al. FaCSim: a fast and cycle-accurate architecture simulator for embedded systems , 2008, LCTES '08.
[23] Frederica Darema,et al. A single-program-multiple-data computational model for EPEX/FORTRAN , 1988, Parallel Comput..
[24] Thomas Fahringer,et al. LibWater: heterogeneous distributed computing made easy , 2013, ICS '13.
[25] Emery D. Berger,et al. Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA '09.
[26] David A. Padua,et al. Compiler techniques for high performance sequentially consistent java programs , 2005, PPOPP.
[27] Alan L. Cox,et al. TreadMarks: shared memory computing on networks of workstations , 1996 .
[28] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[29] Sergei Gorlatch,et al. dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[30] Dan Grossman,et al. CoreDet: a compiler and runtime system for deterministic multithreaded execution , 2010, ASPLOS XV.
[31] Cédric Augonnet,et al. StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators , 2012, EuroMPI.
[32] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[33] Wen-mei W. Hwu,et al. Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..
[34] Jungwon Kim,et al. SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.
[35] Federico Silla,et al. rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.
[36] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[37] Luís Paulo Santos,et al. clOpenCL - Supporting Distributed Heterogeneous Computing in HPC Clusters , 2012, Euro-Par Workshops.
[38] Amnon Barak,et al. A package for OpenCL based heterogeneous computing on clusters with many GPU devices , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).
[39] David A. Padua,et al. Basic compiler algorithms for parallel programs , 1999, PPoPP '99.
[40] Carlos Reaño,et al. A complete and efficient CUDA-sharing solution for HPC clusters , 2014, Parallel Comput..
[41] Dennis Shasha,et al. Efficient and correct execution of parallel programs that share memory , 1988, TOPL.