Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors?
暂无分享,去创建一个
Xipeng Shen | Eddy Z. Zhang | Yunlian Jiang | Kai Tian | Yunlian Jiang | Xipeng Shen | E. Zhang | Kai Tian
[1] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[2] Erik Hagersten,et al. Fast data-locality profiling of native execution , 2005, SIGMETRICS '05.
[3] Mellor-CrummeyJohn,et al. Cross-architecture performance predictions for scientific applications using parameterized models , 2004 .
[4] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .
[5] Harold S. Stone,et al. Footprints in the cache , 1986, SIGMETRICS '86/PERFORMANCE '86.
[6] Michael Stumm,et al. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors , 2007, EuroSys '07.
[7] Tor M. Aamodt,et al. A first-order fine-grained multithreaded throughput model , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[8] Yan Solihin,et al. Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.
[9] Srihari Makineni,et al. Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[10] Chen Ding,et al. Locality approximation using time , 2007, POPL '07.
[11] Chen Ding,et al. All-window profiling of concurrent executions , 2008, PPoPP.
[12] Steve Carr,et al. Feedback-directed memory disambiguation through store distance analysis , 2006, ICS '06.
[13] Dean M. Tullsen,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.
[14] Yutao Zhong,et al. Predicting whole-program locality through reuse distance analysis , 2003, PLDI '03.
[15] Alan Jay Smith,et al. On the effectiveness of set associative page mapping and its application to main memory management , 1976, ICSE '76.
[16] Wentao Chang,et al. Sampling-based program locality approximation , 2008, ISMM '08.
[17] John M. Mellor-Crummey,et al. Cross-architecture performance predictions for scientific applications using parameterized models , 2004, SIGMETRICS '04/Performance '04.
[18] Kristof Beyls,et al. Reuse Distance as a Metric for Cache Behavior. , 2001 .
[19] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[20] G. Edward Suh,et al. Analytical cache models with applications to cache partitioning , 2001, ICS '01.
[21] Xipeng Shen,et al. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? , 2010, PPoPP '10.
[22] M TullsenDean,et al. Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .
[23] Chen Ding,et al. Regression-Based Multi-Model Prediction of Data Reuse Signature , 2003 .
[24] Chen Ding,et al. Miss Rate Prediction Across Program Inputs and Cache Configurations , 2007, IEEE Transactions on Computers.
[25] Peter J. Denning,et al. Thrashing: its causes and prevention , 1968, AFIPS Fall Joint Computing Conference.
[26] Kristof Beyls,et al. Discovery of Locality-Improving Refactorings by Reuse Path Analysis , 2006, HPCC.
[27] BergErik,et al. Fast data-locality profiling of native execution , 2005 .
[28] Xipeng Shen,et al. Scalable Implementation of Efficient Locality Approximation , 2008, LCPC.
[29] Won-Taek Lim,et al. Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[30] Michael D. Smith,et al. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[31] Chen Ding,et al. Program locality analysis using reuse distance , 2009, TOPL.
[32] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[33] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[34] Steve Carr,et al. Instruction based memory distance analysis and its application to optimization , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[35] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[36] Chen Ding,et al. Array regrouping and structure splitting using whole-program reference affinity , 2004, PLDI '04.
[37] Zhao Zhang,et al. Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[38] Alex Settle,et al. Architectural Support for Enhanced SMT Job Scheduling , 2004, IEEE PACT.
[39] Chen Ding,et al. Miss rate prediction across all program inputs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[40] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .
[41] Barbara M. Chapman,et al. Evaluating OpenMP on Chip MultiThreading Platforms , 2005, IWOMP.