暂无分享,去创建一个
Jianbin Fang | Henk J. Sips | Ana Lucia Varbanescu | Yonggang Che | Lilun Zhang | Chuanfu Xu | H. Sips | A. Varbanescu | Yonggang Che | Chuanfu Xu | Lilun Zhang | Jianbin Fang
[1] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[2] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[3] David R. Butenhof. Programming with POSIX threads , 1993 .
[4] Christopher J. Hughes,et al. Performance and Energy Implications of Many-Core Caches for Throughput Computing , 2010, IEEE Micro.
[5] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[6] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[7] Pradeep Dubey,et al. Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.
[8] Matthias S. Müller,et al. Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[9] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[10] Stephen A. Jarvis,et al. Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi Coprocessors , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[11] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[12] James Demmel,et al. the Parallel Computing Landscape , 2022 .
[13] Thomas Fahringer,et al. Automatic OpenCL Device Characterization: Guiding Optimized Kernel Design , 2011, Euro-Par.
[14] Henk Sips,et al. Parallel and Distributed Systems Report Series Benchmarking Intel Xeon Phi to Guide Kernel Design Information about Parallel and Distributed Systems Report Series: Benchmarking Intel Xeon Phi to Guide Kernel Designwp , 2022 .
[15] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.
[16] Sabela Ramos,et al. Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi , 2013, HPDC.
[17] Dirk Schmidl,et al. Assessing the Performance of OpenMP Programs on the Intel Xeon Phi , 2013, Euro-Par.
[18] Alan Jay Smith,et al. Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes , 1995, IEEE Trans. Computers.
[19] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Carl Staelin,et al. Memory hierarchy performance measurement of commercial dual-core desktop processors , 2008, J. Syst. Archit..