Many-Task Computing on Many-Core Architectures

Many-Task Computing (MTC) is a common scenario for multiple parallel systems, such as cluster, grids, cloud and supercomputers, but it is not so popular in shared memory parallel processors. In thi ...

[1]  Mauricio Marín,et al.  kNN Query Processing in Metric Spaces Using GPUs , 2011, Euro-Par.

[2]  Kevin Skadron,et al.  Enabling Task Parallelism in the CUDA Scheduler , 2009 .

[3]  Federico Silla,et al.  Performance of CUDA Virtualized Remote GPUs in High Performance Clusters , 2011, 2011 International Conference on Parallel Processing.

[4]  Pedro Valero-Lara,et al.  Full-Overlapped Concurrent Kernels , 2015 .

[5]  Michael Klemm,et al.  OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison , 2012, MARC@RWTH.

[6]  Kevin Skadron,et al.  Fine-grained resource sharing for concurrent GPGPU kernels , 2012, HotPar'12.

[7]  Pedro Valero-Lara,et al.  Analysis in performance and new model for multiple kernels executions on many-core architectures , 2013, 2013 IEEE 12th International Conference on Cognitive Informatics and Cognitive Computing.

[8]  Jack J. Dongarra,et al.  Batched matrix computations on hardware accelerators based on GPUs , 2015, Int. J. High Perform. Comput. Appl..

[9]  Manuel Prieto,et al.  Block Tridiagonal Solvers on Heterogeneous Architectures , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[10]  Pedro Valero-Lara,et al.  Multi-GPU acceleration of DARTEL (early detection of Alzheimer) , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[11]  Thierry Gautier,et al.  Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[12]  Dirk Schmidl,et al.  Assessing the Performance of OpenMP Programs on the Intel Xeon Phi , 2013, Euro-Par.

[13]  Diego Cazorla,et al.  A GPU-based implementation of the MRF algorithm in ITK package , 2011, The Journal of Supercomputing.

[14]  Pedro Valero-Lara A GPU approach for accelerating 3D deformable registration (DARTEL) on brain biomedical images , 2013, EuroMPI.

[15]  James Reinders,et al.  Intel Xeon Phi Coprocessor High Performance Programming , 2013 .

[16]  David Hunter,et al.  Just the Facts , 2010 .

[17]  Pavel Zezula,et al.  Multi-level Clustering on Metric Spaces Using a Multi-GPU Platform , 2013, Euro-Par.

[18]  Daniel S. Katz,et al.  Design and evaluation of the gemtc framework for GPU-enabled many-task computing , 2014, HPDC '14.

[19]  D. Geer,et al.  Chip makers turn to multicore processors , 2005, Computer.

[20]  Jack J. Dongarra,et al.  Optimization for performance and energy for batched matrix computations on GPUs , 2015, GPGPU@PPoPP.

[21]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[22]  Konstantinos G. Margaritis,et al.  Multiple String Matching on a GPU using CUDAs , 2015, Scalable Comput. Pract. Exp..

[23]  Francky Catthoor,et al.  Polyhedral parallel code generation for CUDA , 2013, TACO.

[24]  Pedro Valero-Lara,et al.  Towards a More Efficient Use of GPUs , 2011, 2011 International Conference on Computational Science and Its Applications.

[25]  Juan Gómez-Luna,et al.  Performance models for CUDA streams on NVIDIA GeForce series , 2012 .

[26]  Shinpei Kato,et al.  TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[27]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[28]  Fumihiko Ino,et al.  A middleware for efficient stream processing in CUDA , 2010, Computer Science - Research and Development.

[29]  Jason Maassen,et al.  Performance Models for CPU-GPU Data Transfers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[30]  Manuel Prieto,et al.  Fast finite difference Poisson solvers on heterogeneous architectures , 2014, Comput. Phys. Commun..

[31]  Grigori Fursin,et al.  Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.

[32]  Ioan Raicu,et al.  Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors , 2013 .

[33]  Baifeng Wu,et al.  Task Scheduling Greedy Heuristics for GPU Heterogeneous Cluster Involving the Weights of the Processor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[34]  Ioan Raicu,et al.  Evaluating the Support of MTC Applications on Intel Xeon Phi Many-Core Accelerators , 2015, 2015 IEEE International Conference on Cluster Computing.

[35]  Jack J. Dongarra,et al.  Towards batched linear solvers on accelerated hardware platforms , 2015, PPOPP.

[36]  Margaret Martonosi,et al.  Power-Efficient Computer Architectures: Recent Advances , 2014, Power-Efficient Computer Architectures: Recent Advances.

[37]  Dawid Pajak General-Purpose Computation Using Graphics Hardware for Fast HDR Image Processing , 2007 .

[38]  Yao Zhang,et al.  Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.