Many-Task Computing on Many-Core Architectures
暂无分享,去创建一个
Ioan Raicu | Pedro Valero-Lara | Poornima Nookala | Fernando López Pelayo | Johan Jansson | Serapheim Dimitropoulos
[1] Mauricio Marín,et al. kNN Query Processing in Metric Spaces Using GPUs , 2011, Euro-Par.
[2] Kevin Skadron,et al. Enabling Task Parallelism in the CUDA Scheduler , 2009 .
[3] Federico Silla,et al. Performance of CUDA Virtualized Remote GPUs in High Performance Clusters , 2011, 2011 International Conference on Parallel Processing.
[4] Pedro Valero-Lara,et al. Full-Overlapped Concurrent Kernels , 2015 .
[5] Michael Klemm,et al. OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison , 2012, MARC@RWTH.
[6] Kevin Skadron,et al. Fine-grained resource sharing for concurrent GPGPU kernels , 2012, HotPar'12.
[7] Pedro Valero-Lara,et al. Analysis in performance and new model for multiple kernels executions on many-core architectures , 2013, 2013 IEEE 12th International Conference on Cognitive Informatics and Cognitive Computing.
[8] Jack J. Dongarra,et al. Batched matrix computations on hardware accelerators based on GPUs , 2015, Int. J. High Perform. Comput. Appl..
[9] Manuel Prieto,et al. Block Tridiagonal Solvers on Heterogeneous Architectures , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.
[10] Pedro Valero-Lara,et al. Multi-GPU acceleration of DARTEL (early detection of Alzheimer) , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).
[11] Thierry Gautier,et al. Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.
[12] Dirk Schmidl,et al. Assessing the Performance of OpenMP Programs on the Intel Xeon Phi , 2013, Euro-Par.
[13] Diego Cazorla,et al. A GPU-based implementation of the MRF algorithm in ITK package , 2011, The Journal of Supercomputing.
[14] Pedro Valero-Lara. A GPU approach for accelerating 3D deformable registration (DARTEL) on brain biomedical images , 2013, EuroMPI.
[15] James Reinders,et al. Intel Xeon Phi Coprocessor High Performance Programming , 2013 .
[16] David Hunter,et al. Just the Facts , 2010 .
[17] Pavel Zezula,et al. Multi-level Clustering on Metric Spaces Using a Multi-GPU Platform , 2013, Euro-Par.
[18] Daniel S. Katz,et al. Design and evaluation of the gemtc framework for GPU-enabled many-task computing , 2014, HPDC '14.
[19] D. Geer,et al. Chip makers turn to multicore processors , 2005, Computer.
[20] Jack J. Dongarra,et al. Optimization for performance and energy for batched matrix computations on GPUs , 2015, GPGPU@PPoPP.
[21] Karthikeyan Sankaralingam,et al. Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.
[22] Konstantinos G. Margaritis,et al. Multiple String Matching on a GPU using CUDAs , 2015, Scalable Comput. Pract. Exp..
[23] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[24] Pedro Valero-Lara,et al. Towards a More Efficient Use of GPUs , 2011, 2011 International Conference on Computational Science and Its Applications.
[25] Juan Gómez-Luna,et al. Performance models for CUDA streams on NVIDIA GeForce series , 2012 .
[26] Shinpei Kato,et al. TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.
[27] Edward T. Grochowski,et al. Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[28] Fumihiko Ino,et al. A middleware for efficient stream processing in CUDA , 2010, Computer Science - Research and Development.
[29] Jason Maassen,et al. Performance Models for CPU-GPU Data Transfers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[30] Manuel Prieto,et al. Fast finite difference Poisson solvers on heterogeneous architectures , 2014, Comput. Phys. Commun..
[31] Grigori Fursin,et al. Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.
[32] Ioan Raicu,et al. Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors , 2013 .
[33] Baifeng Wu,et al. Task Scheduling Greedy Heuristics for GPU Heterogeneous Cluster Involving the Weights of the Processor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[34] Ioan Raicu,et al. Evaluating the Support of MTC Applications on Intel Xeon Phi Many-Core Accelerators , 2015, 2015 IEEE International Conference on Cluster Computing.
[35] Jack J. Dongarra,et al. Towards batched linear solvers on accelerated hardware platforms , 2015, PPOPP.
[36] Margaret Martonosi,et al. Power-Efficient Computer Architectures: Recent Advances , 2014, Power-Efficient Computer Architectures: Recent Advances.
[37] Dawid Pajak. General-Purpose Computation Using Graphics Hardware for Fast HDR Image Processing , 2007 .
[38] Yao Zhang,et al. Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.