Trading cache hit rate for memory performance
暂无分享,去创建一个
Mahmut T. Kandemir | Chita R. Das | Wei Ding | Adwait Jog | Diana Guttman | Praveen Yedlapalli | M. Kandemir | Adwait Jog | C. Das | W. Ding | D. Guttman | Praveen Yedlapalli
[1] David W. Nellans,et al. Prediction Based DRAM Row-Buffer Management in the Many-Core Era , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[2] Mor Harchol-Balter,et al. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[3] Shahid H. Bokhari,et al. A Partitioning Strategy for Nonuniform Problems on Multiprocessors , 1987, IEEE Transactions on Computers.
[4] Onur Mutlu,et al. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[5] Dror Irony,et al. An out-of-core sparse symmetric-indefinite factorization method , 2006, TOMS.
[6] Zhao Zhang,et al. A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality , 2000, MICRO 33.
[7] Joel H. Saltz,et al. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..
[8] A. H. Sherman,et al. Comparative Analysis of the Cuthill–McKee and the Reverse Cuthill–McKee Ordering Algorithms for Sparse Matrices , 1976 .
[9] Tiranee Achalakul,et al. Improving Data Processing Time with Access Sequence Prediction , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.
[10] Chau-Wen Tseng,et al. Exploiting locality for irregular scientific codes , 2006, IEEE Transactions on Parallel and Distributed Systems.
[11] Dror Rawitz,et al. The hardness of cache conscious data placement , 2002, POPL '02.
[12] Intel ® Pentium ® 4 and Intel ® Xeon TM Processor Optimization Reference Manual , 2004 .
[13] References , 1971 .
[14] Albert Cohen,et al. Deep jam: conversion of coarse-grain parallelism to instruction-level and vector parallelism for irregular applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[15] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.
[16] Kevin Kai-Wei Chang,et al. Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[17] Keshav Pingali,et al. The tao of parallelism in algorithms , 2011, PLDI '11.
[18] Jacqueline Chame,et al. A Compiler Algorithm for Exploiting Page-Mode Memory Access in Embedded-DRAM Devices , .
[19] Khalid Omar Thabit,et al. Cache management by the compiler , 1982 .
[20] Tor M. Aamodt,et al. Complexity effective memory access scheduling for many-core accelerator architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[21] Chen Ding,et al. A hierarchical model of data locality , 2006, POPL '06.
[22] Larry Carter,et al. Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[23] Onur Mutlu,et al. DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems , 2010 .
[24] Donald M. Chiarulli,et al. Predicting Multiprocessor Memory Access Patterns with Learning Models , 1997, ICML.
[25] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[26] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..
[27] Ken Kennedy,et al. Improving memory hierarchy performance for irregular applications , 1999, ICS '99.
[28] Chen Ding,et al. Array regrouping and structure splitting using whole-program reference affinity , 2004, PLDI '04.
[29] Rachata Ausavarungnirun,et al. Row buffer locality aware caching policies for hybrid memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).
[30] Mahmut T. Kandemir,et al. Reshaping cache misses to improve row-buffer locality in multicore systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[31] Zhao Zhang,et al. A performance comparison of DRAM memory system optimizations for SMT processors , 2005, 11th International Symposium on High-Performance Computer Architecture.
[32] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[33] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.
[34] Larry Carter,et al. Compile-time composition of run-time data and iteration reorderings , 2003, PLDI '03.
[35] Onur Mutlu,et al. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.
[36] Jing Li,et al. A case for small row buffers in non-volatile main memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).
[37] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[38] Rachata Ausavarungnirun,et al. Row Buffer Locality-Aware Data Placement in Hybrid Memories , 2011 .
[39] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.
[40] Keshav Pingali,et al. How much parallelism is there in irregular applications? , 2009, PPoPP '09.
[41] Tao Li,et al. Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).