Test-driving Intel Xeon Phi
暂无分享,去创建一个
Jianbin Fang | Henk J. Sips | Ana Lucia Varbanescu | Yonggang Che | Lilun Zhang | Chuanfu Xu | H. Sips | A. Varbanescu | Yonggang Che | Chuanfu Xu | Lilun Zhang | Jianbin Fang
[1] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[2] Alessio Sclocco,et al. Radio Astronomy Beam Forming on Many-Core Architectures , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[3] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[4] Kevin Skadron,et al. Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[5] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[6] David R. Butenhof. Programming with POSIX threads , 1993 .
[7] Jim Jeffers. Intel® Xeon Phi™ Coprocessors , 2013 .
[8] Alan Jay Smith,et al. Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes , 1995, IEEE Trans. Computers.
[9] Thomas Fahringer,et al. Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design , 2011, Euro-Par.
[10] Sabela Ramos,et al. Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi , 2013, HPDC.
[11] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[12] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.
[13] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[14] Scott T. Acton,et al. Motion gradient vector flow: an external force for tracking rolling leukocytes with shape and size constrained active contours , 2004, IEEE Transactions on Medical Imaging.
[15] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] Carl Staelin,et al. Memory hierarchy performance measurement of commercial dual-core desktop processors , 2008, J. Syst. Archit..
[17] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[18] Rob van Nieuwpoort,et al. Building high-resolution sky images using the Cell/B.E , 2009, Sci. Program..
[19] Matthias S. Müller,et al. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[20] Henk Sips,et al. Parallel and Distributed Systems Report Series Benchmarking Intel Xeon Phi to Guide Kernel Design Information about Parallel and Distributed Systems Report Series: Benchmarking Intel Xeon Phi to Guide Kernel Designwp , 2022 .