Performance and energy evaluation of data prefetching on intel Xeon Phi
暂无分享,去创建一个
Mahmut T. Kandemir | Diana Guttman | Meenakshi Arunachalam | Vlad Calina | M. Kandemir | D. Guttman | Meenakshi Arunachalam | V. Calina
[1] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[2] Alejandro Duran,et al. The Intel® Many Integrated Core Architecture , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).
[3] Josep Torrellas,et al. Using a user-level memory thread for correlation prefetching , 2002, ISCA.
[4] Niall Gaffney,et al. Performance evaluation of R with Intel Xeon Phi coprocessor , 2013, 2013 IEEE International Conference on Big Data.
[5] Martin Burtscher,et al. Future execution: a hardware prefetching technique for chip multiprocessors , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[6] Siegfried Benkner,et al. HyPHI - Task Based Hybrid Execution C++ Library for the Intel Xeon Phi Coprocessor , 2013, 2013 42nd International Conference on Parallel Processing.
[7] David M. Brooks,et al. Energy characterization and instruction-level energy model of Intel's Xeon Phi processor , 2013, International Symposium on Low Power Electronics and Design (ISLPED).
[8] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[9] Ümit V. Çatalyürek,et al. Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi , 2013, PPAM.
[10] Sandia Report,et al. Improving Performance via Mini-applications , 2009 .
[11] Jesper Larsson Träff,et al. The Pheet Task-Scheduling Framework on the Intel® Xeon Phi Coprocessor and other Multicore Architectures , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[12] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[13] Donald Yeung,et al. Physical experimentation with prefetching helper threads on Intel's hyper-threaded processors , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[14] Ravi Narayanaswamy,et al. Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[15] Lars Koesterke,et al. Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi , 2013, 2013 42nd International Conference on Parallel Processing.
[16] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[17] Giuseppe Coviello,et al. COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors , 2013, HPDC '13.
[18] Michael Klemm,et al. Extending a Highly Parallel Data Mining Algorithm to the Intel ® Many Integrated Core Architecture , 2011, Euro-Par Workshops.
[19] Bingsheng He,et al. Optimizing the MapReduce framework on Intel Xeon Phi coprocessor , 2013, 2013 IEEE International Conference on Big Data.
[20] Dean M. Tullsen,et al. Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.
[21] Michel Dubois,et al. International Conference on Parallel Processing Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors , 2006 .
[22] Jianbin Fang,et al. An Empirical Study of Intel Xeon Phi , 2013, ArXiv.
[23] Stephen A. Jarvis,et al. Exploring SIMD for Molecular Dynamics, Using Intel® Xeon® Processors and Intel® Xeon Phi Coprocessors , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[24] A. Gupta,et al. Evaluation of Rodinia Codes on Intel Xeon Phi , 2013, 2013 4th International Conference on Intelligent Systems, Modelling and Simulation.
[25] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[26] Emre Kultursay,et al. Compiler-Based Data Prefetching and Streaming Non-temporal Store Generation for the Intel(R) Xeon Phi(TM) Coprocessor , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[27] Rudolf Eigenmann,et al. Data forwarding through in-memory precomputation threads , 2004, ICS '04.
[28] Sandhya Dwarkadas,et al. Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[29] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[30] Calvin Lin,et al. Linearizing irregular memory accesses for improved correlated prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[31] Jean-Loup Baer,et al. Dynamic Improvement of Locality in Virtual Memory Systems , 1976, IEEE Transactions on Software Engineering.
[32] Christopher J. Hughes,et al. Performance and Energy Implications of Many-Core Caches for Throughput Computing , 2010, IEEE Micro.
[33] Surendra Byna,et al. A Taxonomy of Data Prefetching Mechanisms , 2008, 2008 International Symposium on Parallel Architectures, Algorithms, and Networks (i-span 2008).
[34] Richard W. Vuduc,et al. When Prefetching Works, When It Doesn’t, and Why , 2012, TACO.
[35] Edward T. Grochowski,et al. Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[36] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[37] Alan Jay Smith,et al. Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.
[38] Pradeep Dubey,et al. Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[39] Fan Ye,et al. The Exploration of Pervasive and Fine-Grained Parallel Model Applied on Intel Xeon Phi Coprocessor , 2013, 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.
[40] Endong Wang,et al. Intel Math Kernel Library , 2014 .