An Exploration into the Effectiveness of Prefetching on Program Performance with the Help of an Autotuning Model
暂无分享,去创建一个
[1] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[2] Mahmut T. Kandemir,et al. Adaptive prefetching for shared cache based chip multiprocessors , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[3] Simha Sethumadhavan,et al. Approximate graph clustering for program characterization , 2012, TACO.
[4] Carole-Jean Wu,et al. PACMan: Prefetch-Aware Cache Management for high performance caching , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[6] Martin Burtscher,et al. On the importance of optimizing the configuration of stream prefetchers , 2005, MSP '05.
[7] Jennifer L. Wong,et al. To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach , 2013, ASPLOS '13.
[8] Apan Qasem,et al. Exposing Tunable Parameters in Multi-threaded Numerical Code , 2010, NPC.
[9] Donald Nguyen,et al. Machine learning-based prefetch optimization for data center applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[10] Onur Mutlu,et al. Coordinated control of multiple prefetchers in multi-core systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Dean M. Tullsen,et al. Inter-core prefetching for multicore processors using migrating helper threads , 2011, ASPLOS XVI.
[12] Berkin Özisikyilmaz,et al. MineBench: A Benchmark Suite for Data Mining Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.
[13] Yanbin Liu,et al. Detection of false sharing using machine learning , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[14] Onur Mutlu,et al. Prefetch-aware shared-resource management for multi-core systems , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[15] Donald Yeung,et al. BioBench: A Benchmark Suite of Bioinformatics Applications , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[16] Pen-Chung Yew,et al. Multiprocessor cache design considerations , 1987, ISCA '87.
[17] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[18] Vijayalakshmi Srinivasan,et al. When prefetching improves/degrades performance , 2005, CF '05.
[19] Simha Sethumadhavan,et al. Rapid identification of architectural bottlenecks via precise event counting , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[20] Michael F. P. O'Boyle,et al. Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[21] 直野 健,et al. Software Automatic Tuning, From Concepts to State-of-the-Art Results , 2010 .
[22] Jean-Loup Baer,et al. Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.
[23] Mahmut T. Kandemir,et al. A compiler-directed data prefetching scheme for chip multiprocessors , 2009, PPoPP '09.
[24] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[25] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[26] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[27] Miodrag Potkonjak,et al. MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[28] Yen-Kuang Chen,et al. The ALPBench benchmark suite for complex multimedia applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..
[29] Michael F. P. O'Boyle,et al. MILEPOST GCC: machine learning based research compiler , 2008 .
[30] Gerhard Wellein,et al. LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[31] Richard W. Vuduc,et al. When Prefetching Works, When It Doesn’t, and Why , 2012, TACO.
[32] Collin McCurdy,et al. Characterizing the Impact of Prefetching on Scientific Application Performance , 2013, PMBS@SC.
[33] Jean-Loup Baer,et al. A performance study of software and hardware data prefetching schemes , 1994, ISCA '94.
[34] Zhenman Fang,et al. Multi-stage coordinated prefetching for present-day processors , 2014, ICS '14.
[35] David J. Lilja,et al. Data prefetch mechanisms , 2000, CSUR.