Optimizing for KNL Usage Modes When Data Doesn't Fit in MCDRAM
暂无分享,去创建一个
Stephen L. Olivier | Simon D. Hammond | Jonathan W. Berry | Peter M. Kogge | Stephen Olivier | Neil Butcher | P. Kogge | S. Hammond | Neil A. Butcher
[1] Lars Koesterke,et al. Interactive Code Adaptation Tool for Modernizing Applications for Intel Knights Landing Processors , 2017, PEARC.
[2] Simon David Hammond,et al. memkind: An Extensible Heap Memory Manager for Heterogeneous Memory Platforms and Mixed Memory Policies. , 2015 .
[3] Emmanuel Jeannot,et al. Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model , 2017, PMBS@SC.
[4] Dinesh Manocha,et al. GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.
[5] Sanguthevar Rajasekaran,et al. A Framework for Simple Sorting Algorithms on Parallel Disk Systems , 2001, SPAA '98.
[6] Yen-Chen Liu,et al. Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.
[7] Bronis R. de Supinski,et al. Scaling OpenMP for Exascale Performance and Portability , 2017, Lecture Notes in Computer Science.
[8] Hao Wang,et al. Exploring and Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Gerth Stølting Brodal,et al. Cache oblivious search trees via binary trees of small height , 2001, SODA '02.
[10] Gerth Stølting Brodal,et al. Engineering a cache-oblivious sorting algorithm , 2008, JEAL.
[11] Peter Sanders,et al. MCSTL: The Multi-core Standard Template Library , 2007, Euro-Par.
[12] Michael A. Bender,et al. Cache-Adaptive Algorithms , 2014, SODA.
[13] Sarah Tariq,et al. Interactive fluid-particle simulation using translating Eulerian grids , 2010, I3D '10.
[14] Sabela Ramos,et al. Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[15] Cynthia A. Phillips,et al. Two-Level Main Memory Co-Design: Multi-threaded Algorithmic Primitives, Analysis, and Simulation , 2015, IPDPS.
[16] Samuel Williams,et al. Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor , 2016, ISC Workshops.
[17] K. Isono,et al. The physical map of the whole E. coli chromosome: Application of a new strategy for rapid analysis and sorting of a large genomic library , 1987, Cell.
[18] Johannes Singler,et al. The GNU libstdc++ parallel mode: software engineering considerations , 2008, IWMSE '08.
[19] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[20] Bingsheng He,et al. Relational query coprocessing on graphics processors , 2009, TODS.
[21] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[22] Cynthia A. Phillips,et al. Two-Level Main Memory Co-Design: Multi-threaded Algorithmic Primitives, Analysis, and Simulation , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.