Evaluation of Hardware Data Prefetchers on Server Processors
暂无分享,去创建一个
Hamid Sarbazi-Azad | Pejman Lotfi-Kamran | Mohammad Bakhshalipour | Seyedali Tabaeiaghdaei | H. Sarbazi-Azad | P. Lotfi-Kamran | Mohammad Bakhshalipour | Seyedali Tabaeiaghdaei | Pejman Lotfi-Kamran
[1] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[2] Dean M. Tullsen,et al. Multithreading Architecture , 2013, Multithreading Architecture.
[3] Anastasia Ailamaki,et al. Improving hash join performance through prefetching , 2004, Proceedings. 20th International Conference on Data Engineering.
[4] Babak Falsafi,et al. Quantifying the Mismatch between Emerging Scale-Out Applications and Modern Processors , 2012, TOCS.
[5] Daniel A. Jiménez,et al. Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[6] Anantha Chandrakasan,et al. SMART: A single-cycle reconfigurable NoC for SoC applications , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[7] Josep Torrellas,et al. The memory performance of DSS commercial workloads in shared-memory multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[8] IBM Blue Gene team,et al. Design of the IBM Blue Gene/Q Compute chip , 2013, IBM J. Res. Dev..
[9] Aamer Jaleel,et al. Sandbox Prefetching: Safe run-time evaluation of aggressive prefetchers , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[10] James R. Larus,et al. Using Cohort-Scheduling to Enhance Server Performance , 2002, USENIX Annual Technical Conference, General Track.
[11] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[12] Trevor N. Mudge,et al. Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.
[13] Thomas F. Wenisch,et al. Enhancing Server Efficiency in the Face of Killer Microseconds , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[14] Calvin Lin,et al. Memory Prefetching Using Adaptive Stream Detection , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[15] Jaehyuk Huh,et al. Exploring the design space of future CMPs , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[16] Vivek Sarkar,et al. In-Register Parameter Caching for Dynamic Neural Nets with Virtual Persistent Processor Specialization , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[17] Hamid Sarbazi-Azad,et al. Bingo Spatial Data Prefetcher , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[18] Pat Conway,et al. The AMD Opteron Northbridge Architecture , 2007, IEEE Micro.
[19] Mehdi Modarressi,et al. Fast Data Delivery for Many-Core Processors , 2018, IEEE Transactions on Computers.
[20] Trishul M. Chilimbi. On the stability of temporal data reference profiles , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[21] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[22] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[23] Jack Doweck,et al. Inside Intel® Core microarchitecture , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).
[24] Nael B. Abu-Ghazaleh,et al. CORF: Coalescing Operand Register File for GPUs , 2019, ASPLOS.
[25] Mehmet Kayaalp,et al. RIC: Relaxed Inclusion Caches for mitigating LLC side-channel attacks , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).
[26] Michael Gschwind,et al. The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.
[27] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[28] Yuan Chou,et al. Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[29] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[30] Jinchun Kim,et al. Path confidence based lookahead prefetching , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[31] Thomas F. Wenisch,et al. Spatial Memory Streaming , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[32] Onur Mutlu,et al. Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks , 2014, ACM Trans. Archit. Code Optim..
[33] Todd C. Mowry,et al. Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.
[34] Francisco J. Cazorla,et al. Making data prefetch smarter: Adaptive prefetching on POWER7 , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[35] Gary Lauterbach,et al. UltraSPARC-III: designing third-generation 64-bit performance , 1999, IEEE Micro.
[36] Christopher J. Hughes,et al. Memory-side prefetching for linked data structures for processor-in-memory systems , 2005, J. Parallel Distributed Comput..
[37] Onur Mutlu,et al. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems , 2010, ASPLOS 2010.
[38] Babak Falsafi,et al. To Share or Not To Share? , 2007, VLDB.
[39] Kei Hiraki,et al. Access map pattern matching for data cache prefetch , 2009, ICS.
[40] Babak Falsafi,et al. Scale-out processors , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[41] Sally A. McKee,et al. Hardware-only stream prefetching and dynamic access ordering , 2000, ICS '00.
[42] Martin Burtscher,et al. Future execution: A prefetching mechanism that uses multiple cores to speed up single threads , 2006, TACO.
[43] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[44] Santosh G. Abraham,et al. Effective stream-based and execution-based data prefetching , 2004, ICS '04.
[45] Martin Hirzel,et al. Dynamic hot data stream prefetching for general-purpose programs , 2002, PLDI '02.
[46] Carole-Jean Wu,et al. Characterization and dynamic mitigation of intra-application cache interference , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[47] Dean M. Tullsen,et al. Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.
[48] Jaejin Lee,et al. Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems , 2009, IEEE Transactions on Parallel and Distributed Systems.
[49] James E. Smith,et al. A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[50] Onur Mutlu,et al. Prefetch-aware shared-resource management for multi-core systems , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[51] Thomas F. Wenisch,et al. Spatio-temporal memory streaming , 2009, ISCA '09.
[52] Hamid Sarbazi-Azad,et al. Near-Ideal Networks-on-Chip for Servers , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[53] Thomas F. Wenisch,et al. A Primer on Hardware Prefetching , 2014, A Primer on Hardware Prefetching.
[54] Josep Torrellas,et al. Using a user-level memory thread for correlation prefetching , 2002, ISCA.
[55] Pejman Lotfi-Kamran,et al. Cache Replacement Policy Based on Expected Hit Count , 2018, IEEE Computer Architecture Letters.
[56] Yen-Chen Liu,et al. Knights Landing: Second-Generation Intel Xeon Phi Product , 2016, IEEE Micro.
[57] C. Grünloh. To Share Or Not To Share , 2019, Case Medical Research.
[58] Onur Mutlu,et al. Accelerating Dependent Cache Misses with an Enhanced Memory Controller , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[59] James E. Smith,et al. Data Cache Prefetching Using a Global History Buffer , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[60] Onur Mutlu,et al. Prefetch-Aware DRAM Controllers , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[61] Mahmut T. Kandemir,et al. Meeting midway: Improving CMP performance with memory-side prefetching , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[62] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[63] Thomas F. Wenisch,et al. The Queuing-First Approach for Tail Management of Interactive Services , 2019, IEEE Micro.
[64] John Paul Shen,et al. Scaling and characterizing database workloads: bridging the gap between research and practice , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[65] Hamid Sarbazi-Azad,et al. LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching , 2018, ASPLOS.
[66] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[67] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[68] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.
[69] Hamid Sarbazi-Azad,et al. An Efficient Hybrid-Switched Network-on-Chip for Chip Multiprocessors , 2016, IEEE Transactions on Computers.
[70] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[71] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[72] Babak Falsafi,et al. NOC-Out: Microarchitecting a Scale-Out Processor , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[73] Zhenman Fang,et al. Multi-stage coordinated prefetching for present-day processors , 2014, ICS '14.
[74] Christopher Hughes,et al. Speculative precomputation: long-range prefetching of delinquent loads , 2001, ISCA 2001.
[75] Hamid Sarbazi-Azad,et al. Domino Temporal Data Prefetcher , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[76] Ian H. Witten,et al. Identifying Hierarchical Structure in Sequences: A linear-time algorithm , 1997, J. Artif. Intell. Res..
[77] G. Sohi,et al. Effective jump-pointer prefetching for linked data structures , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).
[78] Pejman Lotfi-Kamran,et al. An Efficient Temporal Data Prefetcher for L1 Caches , 2017, IEEE Computer Architecture Letters.
[79] Thomas F. Wenisch,et al. Practical off-chip meta-data for temporal memory streaming , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[80] Babak Falsafi,et al. Accurate and complexity-effective spatial pattern prediction , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[81] Brad Calder,et al. A Decoupled Predictor-Directed Stream Prefetching Architecture , 2003, IEEE Trans. Computers.
[82] Sarita V. Adve,et al. Performance of database workloads on shared-memory systems with out-of-order processors , 1998, ASPLOS VIII.
[83] Babak Falsafi,et al. Predictor virtualization , 2008, ASPLOS.
[84] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..
[85] Gu-Yeon Wei,et al. Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[86] Vivek Sarkar,et al. RegMutex: Inter-Warp GPU Register Time-Sharing , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[87] Reena Panda,et al. B-Fetch: Branch Prediction Directed Prefetching for Chip-Multiprocessors , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[88] Hamid Sarbazi-Azad,et al. Scale-Out Processors & Energy Efficiency , 2018, ArXiv.
[89] Per Stenström,et al. Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.
[90] Jan Reineke,et al. Ascertaining Uncertainty for Efficient Exact Cache Analysis , 2017, CAV.
[91] Calvin Lin,et al. Linearizing irregular memory accesses for improved correlated prefetching , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[92] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[93] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[94] Douglas J. Joseph,et al. Prefetching Using Markov Predictors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[95] Seth H. Pugsley,et al. Efficiently prefetching complex address patterns , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[96] Mikko H. Lipasti,et al. Stealth prefetching , 2006, ASPLOS XII.
[97] Mahmut T. Kandemir,et al. Adaptive prefetching for shared cache based chip multiprocessors , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[98] Mor Harchol-Balter,et al. ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .
[99] Babak Falsafi,et al. Database Servers on Chip Multiprocessors: Limitations and Opportunities , 2007, CIDR.
[100] Pierre Michaud. Best-offset hardware prefetching , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[101] Weifeng Zhang,et al. A self-repairing prefetcher in an event-driven dynamic optimization framework , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[102] Onur Mutlu,et al. Software-Based Online Detection of Hardware Defects Mechanisms, Architectural Support, and Evaluation , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[103] Carole-Jean Wu,et al. PACMan: Prefetch-Aware Cache Management for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[104] Susan J. Eggers,et al. An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.
[105] Sanjeev Kumar,et al. Exploiting spatial locality in data caches using spatial footprints , 1998, ISCA.
[106] Marcelo Cintra,et al. Stream chaining: exploiting multiple levels of correlation in data prefetching , 2009, ISCA '09.
[107] Thomas F. Wenisch,et al. Temporal streaming of shared memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[108] Junfeng Yang,et al. Stable Deterministic Multithreading through Schedule Memoization , 2010, OSDI.
[109] Onur Mutlu,et al. Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[110] Sparsh Mittal,et al. A Survey of Recent Prefetching Techniques for Processor Caches , 2016, ACM Comput. Surv..
[111] Onur Mutlu,et al. Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[112] Babak Falsafi,et al. Optimizing Data-Center TCO with Scale-Out Processors , 2012, IEEE Micro.
[113] David J. DeWitt,et al. DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.
[114] K.J. Nesbit,et al. AC/DC: an adaptive data cache prefetcher , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[115] Hyesoon Kim,et al. Techniques for Efficient Processing in Runahead Execution Engines , 2005, ISCA 2005.
[116] Onur Mutlu,et al. Techniques for efficient processing in runahead execution engines , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[117] John Paul Shen,et al. Dynamic speculative precomputation , 2001, MICRO.
[118] Hamid Sarbazi-Azad,et al. Reducing Writebacks Through In-Cache Displacement , 2019, ACM Trans. Design Autom. Electr. Syst..
[119] Hamid Sarbazi-Azad,et al. Die-Stacked DRAM: Memory, Cache, or MemCache? , 2018, ArXiv.
[120] Brad Calder,et al. Predictor-directed stream buffers , 2000, MICRO 33.