Optimizing off-chip accesses in multicores
暂无分享,去创建一个
Mahmut T. Kandemir | Emre Kultursay | Yuanrui Zhang | Wei Ding | Xulong Tang | M. Kandemir | W. Ding | Yuanrui Zhang | Xulong Tang | Emre Kultursay
[1] Natalie D. Enright Jerger,et al. Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.
[2] Uday Bondhugula,et al. Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[3] Anoop Gupta,et al. Operating system support for improving data locality on CC-NUMA compute servers , 1996, ASPLOS VII.
[4] Mor Harchol-Balter,et al. Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[5] Michael F. P. O'Boyle,et al. Non-singular data transformations: definition, validity and applications , 1997, ICS '97.
[6] Mainak Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[7] David A. Padua,et al. Compiler Techniques for the Distribution of Data and Computation , 2003, IEEE Trans. Parallel Distributed Syst..
[8] Reetuparna Das,et al. Application-to-core mapping policies to reduce memory system interference in multi-core systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[9] Alexander Schrijver,et al. Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.
[10] Vivien Quéma,et al. Traffic management: a holistic approach to memory placement on NUMA systems , 2013, ASPLOS '13.
[11] Hannu Tenhunen,et al. Optimal memory controller placement for chip multiprocessor , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[12] Frank Mueller,et al. Feedback-directed page placement for ccNUMA via hardware-generated memory traces , 2010, J. Parallel Distributed Comput..
[13] Rudolf Eigenmann,et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.
[14] A. Snavely,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.
[15] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[16] John Zahorjan,et al. Optimizing Data Locality by Array Restructuring , 1995 .
[17] M. Franz,et al. Splitting Data Objects to Increase Cache Utilization ( Preliminary Version , 9 th October 1998 ) , 1998 .
[18] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[19] Kei Hiraki,et al. Unified memory optimizing architecture: memory subsystem control with a unified predictor , 2012, ICS '12.
[20] Luca Benini,et al. Networks on chips - technology and tools , 2006, The Morgan Kaufmann series in systems on silicon.
[21] Mahmut T. Kandemir,et al. Neighborhood-aware data locality optimization for NoC-based multicores , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[22] Wei Ding,et al. Optimizing Off-Chip Accesses in Manycores , 2015 .
[23] Thomas R. Gross,et al. Matching memory access patterns and data placement for NUMA systems , 2012, CGO '12.
[24] Todd C. Mowry,et al. Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.
[25] Alberto Ros,et al. Distance-aware round-robin mapping for large NUCA caches , 2009, 2009 International Conference on High Performance Computing (HiPC).
[26] Mor Harchol-Balter,et al. ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .
[27] Hyunjin Lee,et al. A flexible data to L2 cache mapping approach for future multicore processors , 2006, MSPC '06.
[28] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[29] Monica S. Lam,et al. Data and computation transformations for multiprocessors , 1995, PPOPP '95.
[30] Ryan N. Rakvic,et al. Replacement techniques for dynamic NUCA cache designs on CMPs , 2013, The Journal of Supercomputing.