Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis
暂无分享,去创建一个
[1] Jaehyuk Huh,et al. Exploring the design space of future CMPs , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[2] R. Iyer,et al. Performance , Area and Bandwidth Implications on Large-scale CMP Cache Design , 2007 .
[3] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[4] Donald Yeung,et al. Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[5] Milind Kulkarni,et al. Accelerating multicore reuse distance analysis with sampling and parallelization , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[6] Srihari Makineni,et al. Exploring the cache design space for large scale CMPs , 2005, CARN.
[7] Jian Li,et al. Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[8] Donald Yeung,et al. Understanding Multicore Cache Behavior of Loop-based Parallel Programs via Reuse Distance Analysis , 2012 .
[9] Chen Ding,et al. Linear-time Modeling of Program Working Set in Shared Cache , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[10] Chen Ding,et al. A Composable Model for Analyzing Locality of Multi-threaded Programs , 2009 .
[11] Xipeng Shen,et al. Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? , 2010, CC.
[12] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[13] Derek L. Schuff,et al. Multicore-aware reuse distance analysis , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[14] Kunle Olukotun,et al. Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[15] Berkin Özisikyilmaz,et al. MineBench: A Benchmark Suite for Data Mining Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.
[16] Chen Ding,et al. Miss rate prediction across all program inputs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[17] Kevin Skadron,et al. CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[18] Collin McCurdy,et al. Using Pin as a memory reference generator for multiprocessor simulation , 2005, CARN.
[19] Apan Qasem,et al. Evaluating a Model for Cache Conflict Miss Prediction , 2005 .
[20] Milind Kulkarni,et al. Towards architecture independent metrics for multicore performance analysis , 2011, PERV.
[21] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[22] Kristof Beyls,et al. Reuse Distance as a Metric for Cache Behavior. , 2001 .