论文信息 - Many-Task Computing on Many-Core Architectures

Many-Task Computing on Many-Core Architectures

Many-Task Computing (MTC) is a common scenario for multiple parallel systems, such as cluster, grids, cloud and supercomputers, but it is not so popular in shared memory parallel processors. In thi ...

[1] Mauricio Marín,et al. kNN Query Processing in Metric Spaces Using GPUs , 2011, Euro-Par.

[2] Kevin Skadron,et al. Enabling Task Parallelism in the CUDA Scheduler , 2009 .

[3] Federico Silla,et al. Performance of CUDA Virtualized Remote GPUs in High Performance Clusters , 2011, 2011 International Conference on Parallel Processing.

[4] Pedro Valero-Lara,et al. Full-Overlapped Concurrent Kernels , 2015 .

[5] Michael Klemm,et al. OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison , 2012, MARC@RWTH.

[6] Kevin Skadron,et al. Fine-grained resource sharing for concurrent GPGPU kernels , 2012, HotPar'12.

[7] Pedro Valero-Lara,et al. Analysis in performance and new model for multiple kernels executions on many-core architectures , 2013, 2013 IEEE 12th International Conference on Cognitive Informatics and Cognitive Computing.

[8] Jack J. Dongarra,et al. Batched matrix computations on hardware accelerators based on GPUs , 2015, Int. J. High Perform. Comput. Appl..

[9] Manuel Prieto,et al. Block Tridiagonal Solvers on Heterogeneous Architectures , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[10] Pedro Valero-Lara,et al. Multi-GPU acceleration of DARTEL (early detection of Alzheimer) , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[11] Thierry Gautier,et al. Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[12] Dirk Schmidl,et al. Assessing the Performance of OpenMP Programs on the Intel Xeon Phi , 2013, Euro-Par.

[13] Diego Cazorla,et al. A GPU-based implementation of the MRF algorithm in ITK package , 2011, The Journal of Supercomputing.

[14] Pedro Valero-Lara. A GPU approach for accelerating 3D deformable registration (DARTEL) on brain biomedical images , 2013, EuroMPI.

[15] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[16] David Hunter,et al. Just the Facts , 2010 .

[17] Pavel Zezula,et al. Multi-level Clustering on Metric Spaces Using a Multi-GPU Platform , 2013, Euro-Par.

[18] Daniel S. Katz,et al. Design and evaluation of the gemtc framework for GPU-enabled many-task computing , 2014, HPDC '14.

[19] D. Geer,et al. Chip makers turn to multicore processors , 2005, Computer.

[20] Jack J. Dongarra,et al. Optimization for performance and energy for batched matrix computations on GPUs , 2015, GPGPU@PPoPP.

[21] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[22] Konstantinos G. Margaritis,et al. Multiple String Matching on a GPU using CUDAs , 2015, Scalable Comput. Pract. Exp..

[23] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.

[24] Pedro Valero-Lara,et al. Towards a More Efficient Use of GPUs , 2011, 2011 International Conference on Computational Science and Its Applications.

[25] Juan Gómez-Luna,et al. Performance models for CUDA streams on NVIDIA GeForce series , 2012 .

[26] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[27] Edward T. Grochowski,et al. Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[28] Fumihiko Ino,et al. A middleware for efficient stream processing in CUDA , 2010, Computer Science - Research and Development.

[29] Jason Maassen,et al. Performance Models for CPU-GPU Data Transfers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[30] Manuel Prieto,et al. Fast finite difference Poisson solvers on heterogeneous architectures , 2014, Comput. Phys. Commun..

[31] Grigori Fursin,et al. Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.

[32] Ioan Raicu,et al. Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors , 2013 .

[33] Baifeng Wu,et al. Task Scheduling Greedy Heuristics for GPU Heterogeneous Cluster Involving the Weights of the Processor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[34] Ioan Raicu,et al. Evaluating the Support of MTC Applications on Intel Xeon Phi Many-Core Accelerators , 2015, 2015 IEEE International Conference on Cluster Computing.

[35] Jack J. Dongarra,et al. Towards batched linear solvers on accelerated hardware platforms , 2015, PPOPP.

[36] Margaret Martonosi,et al. Power-Efficient Computer Architectures: Recent Advances , 2014, Power-Efficient Computer Architectures: Recent Advances.

[37] Dawid Pajak. General-Purpose Computation Using Graphics Hardware for Fast HDR Image Processing , 2007 .

[38] Yao Zhang,et al. Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.