Prefetching and cache management using task lifetimes
暂无分享,去创建一个
Dionisios N. Pnevmatikatos | Dimitrios S. Nikolopoulos | Vassilis Papaefstathiou | Manolis Katevenis | M. Katevenis | D. Pnevmatikatos | Vassilis D. Papaefstathiou
[1] Srinivas Devadas,et al. Software-assisted cache replacement mechanisms for embedded systems , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).
[2] Bruce Jacob,et al. DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.
[3] Dionisios N. Pnevmatikatos,et al. Formic: Cost-efficient and Scalable Prototyping of Manycore Architectures , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.
[4] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[5] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[6] Dimitrios S. Nikolopoulos,et al. A Unified Scheduler for Recursive and Task Dataflow Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[7] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[8] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[9] Arch D. Robison,et al. Intel® Threading Building Blocks (TBB) , 2011, Encyclopedia of Parallel Computing.
[10] Andrew B. Kahng,et al. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[11] George C. Necula,et al. CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.
[12] Mendel Rosenblum,et al. Streamware: programming general-purpose multicore processors using streams , 2008, ASPLOS.
[13] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.
[14] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[15] Polyvios Pratikakis,et al. BDDT:: block-level dynamic dependence analysisfor deterministic task-based parallelism , 2012, PPoPP '12.
[16] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[17] Nick Knupffer. Intel Corporation , 2018, The Grants Register 2019.
[18] J. Demmel,et al. Sun Microsystems , 1996 .
[19] William J. Dally,et al. Architectural Support for the Stream Execution Model on General-Purpose Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[20] Kathryn S. McKinley,et al. Guided region prefetching: a cooperative hardware/software approach , 2003, ISCA '03.
[21] Gurindar S. Sohi,et al. Serialization sets: a dynamic dependence-based parallel execution model , 2009, PPoPP '09.
[22] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[23] Jesús Labarta,et al. Handling task dependencies under strided and aliased references , 2010, ICS '10.
[24] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[25] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[26] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[27] Alejandro Duran,et al. The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.
[28] Andrew Brownsword,et al. Synchronization via scheduling: techniques for efficiently managing shared state , 2011, PLDI '11.
[29] Mikko H. Lipasti,et al. Stealth prefetching , 2006, ASPLOS XII.
[30] Jeffrey Overbey,et al. A type and effect system for deterministic parallel Java , 2009, OOPSLA '09.
[31] Carole-Jean Wu,et al. PACMan: Prefetch-Aware Cache Management for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[32] Marcelo Cintra,et al. Stream chaining: exploiting multiple levels of correlation in data prefetching , 2009, ISCA '09.
[33] Arnold L. Rosenberg,et al. Using the compiler to improve cache replacement decisions , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[34] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.