Identifying Power-Efficient Multicore Cache Hierarchies via Reuse Distance Analysis
暂无分享,去创建一个
[1] Ozalp Babaoglu,et al. ACM Transactions on Computer Systems , 2007 .
[2] Donald Yeung,et al. Studying multicore processor scaling via reuse distance analysis , 2013, ISCA.
[3] A. Agarwal,et al. Control-theoretical CPU allocation : Design and Implementation with Feedback Control , 2011 .
[4] Erik Hagersten,et al. Fast data-locality profiling of native execution , 2005, SIGMETRICS '05.
[5] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[6] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[7] Donald Yeung,et al. Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis , 2012, MSPC '12.
[8] Chen Sun,et al. DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.
[9] Chen Ding,et al. A Composable Model for Analyzing Locality of Multi-threaded Programs , 2009 .
[10] James E. Smith,et al. Advanced Micro Devices , 2005 .
[11] David Eklov,et al. Fast modeling of shared caches in multicore systems , 2011, HiPEAC.
[12] Sally A. McKee,et al. Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.
[13] Brian Rogers,et al. Scaling the bandwidth wall: challenges in and avenues for CMP scaling , 2009, ISCA '09.
[14] Kevin Skadron,et al. CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[15] David M. Brooks,et al. Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.
[16] Dong-Sheng Wang,et al. Hierarchical Cache Directory for CMP , 2010, Journal of Computer Science and Technology.
[17] Kunle Olukotun,et al. Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[18] Lieven Eeckhout,et al. The Multi-Program Performance Model: Debunking current practice in multi-core simulation , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).
[19] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[20] Gabriel H. Loh,et al. 3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.
[21] Deborah A. Wallach. PHD: A Hierarchical Cache Coherent Protocol , 1992 .
[22] Apan Qasem,et al. Evaluating a Model for Cache Conflict Miss Prediction , 2005 .
[23] Xipeng Shen,et al. Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? , 2010, CC.
[24] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[25] Peter J. Denning,et al. The working set model for program behavior , 1968, CACM.
[26] David Eklov,et al. StatStack: Efficient modeling of LRU caches , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[27] Donald Yeung,et al. Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[28] Milind Kulkarni,et al. Accelerating multicore reuse distance analysis with sampling and parallelization , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[29] Chen Ding,et al. Miss rate prediction across all program inputs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[30] Derek L. Schuff,et al. Multicore-aware reuse distance analysis , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[31] Erik Hagersten,et al. StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.
[32] YeungDonald,et al. Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs , 2013 .
[33] Berkin Özisikyilmaz,et al. MineBench: A Benchmark Suite for Data Mining Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.
[34] Yutao Zhong,et al. Predicting whole-program locality through reuse distance analysis , 2003, PLDI.
[35] Srihari Makineni,et al. Exploring the cache design space for large scale CMPs , 2005, CARN.
[36] Jaehyuk Huh,et al. Exploring the design space of future CMPs , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[37] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[38] Jian Li,et al. Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[39] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[40] R. Iyer,et al. Performance , Area and Bandwidth Implications on Large-scale CMP Cache Design , 2007 .
[41] Chen Ding,et al. Program locality analysis using reuse distance , 2009, TOPL.
[42] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[43] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.