暂无分享,去创建一个
Gopinath Chennupati | Stephan Eidenbenz | Abdel-Hameed A. Badawy | Nandakishore Santhi | Atanu Barai
[1] T.G. Venkatesh,et al. Analytical Derivation of Concurrent Reuse Distance Profile for Multi-Threaded Application Running on Chip Multi-Processor , 2019, IEEE Transactions on Parallel and Distributed Systems.
[2] Yutao Zhong,et al. Predicting whole-program locality through reuse distance analysis , 2003, PLDI.
[3] Lieven Eeckhout,et al. Modeling Superscalar Processor Memory-Level Parallelism , 2018, IEEE Computer Architecture Letters.
[4] Zhen Yang,et al. Modeling and Stack Simulation of CMP Cache Capacity and Accessibility , 2009, IEEE Transactions on Parallel and Distributed Systems.
[5] Jaehyuk Huh,et al. Exploring the design space of future CMPs , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[6] Zhe Wang,et al. Fast and Accurate Exploration of Multi-level Caches Using Hierarchical Reuse Distance , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[7] Erik Hagersten,et al. A statistical multiprocessor cache model , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[8] Xipeng Shen,et al. Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? , 2010, CC.
[9] Chen Ding,et al. Program locality analysis using reuse distance , 2009, TOPL.
[10] David A. Padua,et al. Estimating cache misses and locality using stack distances , 2003, ICS '03.
[11] Chen Ding,et al. A Composable Model for Analyzing Locality of Multi-threaded Programs , 2009 .
[12] Kristof Beyls,et al. Reuse Distance as a Metric for Cache Behavior. , 2001 .
[13] Gopinath Chennupati,et al. Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines , 2019, SIGSIM-PADS.
[14] Satyajayant Misra,et al. A Scalable Analytical Memory Model for CPU Performance Prediction , 2017, PMBS@SC.
[15] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[16] Chen Ding,et al. Locality approximation using time , 2007, POPL '07.
[17] David Black-Schaffer,et al. Formalizing Data Locality in Task Parallel Applications , 2016, ICA3PP Workshops.
[18] Cong Xu,et al. Moguls: A model to explore the memory hierarchy for bandwidth improvements , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[19] Chen Ding,et al. Reuse Distance Analysis , 2001 .
[20] Gopinath Chennupati,et al. An analytical memory hierarchy model for performance prediction , 2017, 2017 Winter Simulation Conference (WSC).
[21] Kunle Olukotun,et al. Maximizing CMP throughput with mediocre cores , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[22] Bronis R. de Supinski,et al. A ROSE-Based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries , 2010, IWOMP.
[23] Chen Ding,et al. Miss Rate Prediction Across Program Inputs and Cache Configurations , 2007, IEEE Transactions on Computers.
[24] Vijay Janapa Reddi,et al. PIN: a binary instrumentation tool for computer architecture research and education , 2004, WCAE '04.
[25] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[26] Per Stenström,et al. Performance and power impact of issue-width in chip-multiprocessor cores , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..
[27] David A. Wood,et al. Reuse-based online models for caches , 2013, SIGMETRICS '13.
[28] David Black-Schaffer,et al. Analytical Processor Performance and Power Modeling Using Micro-Architecture Independent Characteristics , 2016, IEEE Transactions on Computers.
[29] Mateo Valero,et al. Improving Cache Management Policies Using Dynamic Reuse Distances , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[30] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[31] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[32] Donald Yeung,et al. Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[33] Milind Kulkarni,et al. Accelerating multicore reuse distance analysis with sampling and parallelization , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[34] Derek L. Schuff,et al. Multicore-aware reuse distance analysis , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[35] 包斌,et al. Performance Metrics and Models for Shared Cache , 2014 .
[36] Erik Hagersten,et al. StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.