Generalization of the decremental performance analysis to differential analysis. (Généralisation de l'analyse de performance décrémentale vers l'analyse différentielle)
暂无分享,去创建一个
[1] Raj Jain,et al. The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.
[2] Ayal Zaks,et al. Swing Modulo Scheduling for GCC , 2004 .
[3] William Jalby,et al. Hardware Performance Monitoring for the Rest of Us: A Position and Survey , 2011, NPC.
[4] James Goodman,et al. MESIF: A Two-Hop Cache Coherency Protocol for Point-to-Point Interconnects (2004) , 2004 .
[5] Qin Zhao,et al. Transparent dynamic instrumentation , 2012, VEE '12.
[6] Andres Charif Rubial,et al. Performance Tuning of x86 OpenMP Codes with MAQAO , 2009, Parallel Tools Workshop.
[7] Norman P. Jouppi,et al. Multi-Core Cache Hierarchies , 2011, Multi-Core Cache Hierarchies.
[8] S. Eranian. Perfmon2: a flexible performance monitoring interface for Linux , 2010 .
[9] Mary Lou Soffa,et al. Low overhead program monitoring and profiling , 2005, PASTE '05.
[10] Shai Rubin,et al. Focusing processor policies via critical-path prediction , 2001, ISCA 2001.
[11] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[12] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .
[13] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .
[14] William Jalby,et al. MicroTools: Automating Program Generation and Performance Measurement , 2012, 2012 41st International Conference on Parallel Processing Workshops.
[15] Barbara Chapman,et al. Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .
[16] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[17] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[18] John Paul Shen,et al. Theoretical modeling of superscalar processor performance , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.
[19] John M. Mellor-Crummey,et al. Cross-architecture performance predictions for scientific applications using parameterized models , 2004, SIGMETRICS '04/Performance '04.
[20] Michael Frumkin,et al. The OpenMP Implementation of NAS Parallel Benchmarks and its Performance , 2013 .
[21] Andrs Vajda. Programming Many-Core Chips , 2011 .
[22] Stefanos Kaxiras,et al. Interval-based models for run-time DVFS orchestration in superscalar processors , 2010, CF '10.
[23] William Jalby,et al. Quantifying performance bottleneck cost through differential analysis , 2013, ICS '13.
[24] Sangkyum Kim,et al. ADP: automated diagnosis of performance pathologies using hardware events , 2012, SIGMETRICS '12.
[25] Robert B. Ross,et al. Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.
[26] Yu Chen,et al. A New Algorithm for Identifying Loops in Decompilation , 2007, SAS.
[27] Andres Charif Rubial,et al. CQA: A code quality analyzer tool at binary level , 2014, 2014 21st International Conference on High Performance Computing (HiPC).
[28] R. Campbell,et al. Automated Fingerprinting of Performance Pathologies Using Performance Monitoring Units ( PMUs ) , 2011 .
[29] Emery D. Berger,et al. STABILIZER: statistically sound performance evaluation , 2013, ASPLOS '13.
[30] John M. Mellor-Crummey,et al. A new approach for performance analysis of openMP programs , 2013, ICS '13.
[31] William Jalby,et al. A Balanced Approach to Application Performance Tuning , 2009, LCPC.
[32] Matthias Hauswirth,et al. Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.
[33] Derek Bruening,et al. An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[34] B. Schimmelpfennig,et al. Quantum chemical and molecular dynamics study of the coordination of Th(IV) in aqueous solvent. , 2010, The journal of physical chemistry. B.
[35] David A. Wood,et al. A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.
[36] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[37] Lars Koesterke,et al. PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[38] Edward M. McCreight. The Dragon Computer System , 1985 .
[39] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[40] Dhabaleswar K. Panda,et al. Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.
[41] Martin Burtscher,et al. AutoSCOPE : Automatic Suggestions for Code Optimizations using PerfExpert , 2011 .
[42] Stéphan Jourdan,et al. An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors , 2004, International Journal of Parallel Programming.
[43] Shirley Moore,et al. Non-determinism and overcount on modern hardware performance counter implementations , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[44] Margaret Martonosi,et al. MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.
[45] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[46] Kim M. Hazelwood,et al. SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[47] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org , 2010 .
[48] Andres Charif Rubial,et al. MIL: A language to build program analysis tools through static binary instrumentation , 2013, 20th Annual International Conference on High Performance Computing.
[49] Thomas F. Wenisch,et al. SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.
[50] Rastislav Bodík,et al. Slack: maximizing performance under technological constraints , 2002, ISCA.
[51] Cédric Valensi. A generic approach to the definition of low-level components for multi-architecture binary analysis , 2014 .
[52] James E. Smith,et al. A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[53] Michael Laurenzano,et al. PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[54] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .
[55] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[56] Matthias Hauswirth,et al. Accuracy of performance counter measurements , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[57] David J. Kuck. Computational Capacity-Based Codesign of Computer Systems , 2012, High-Performance Scientific Computing.
[58] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[59] Rastislav Bodík,et al. Using Interaction Costs for Microarchitectural Bottleneck Analysis , 2003, MICRO.
[60] Sadaf R. Alam,et al. Characterization of Scientific Workloads on Systems with Multi-Core Processors , 2006, 2006 IEEE International Symposium on Workload Characterization.
[61] Jack Dongarra,et al. Integrated Tool Capabilities for Performance Instrumentation and Measurement , .
[62] Sally A. McKee,et al. Can hardware performance counters be trusted? , 2008, 2008 IEEE International Symposium on Workload Characterization.
[63] William Jalby,et al. Simsys: a performance simulation framework , 2013, RAPIDO '13.
[64] Iro Pantazi-Mytarelli. The history and use of pipelining computer architecture: MIPS pipelining implementation , 2013, 2013 IEEE Long Island Systems, Applications and Technology Conference (LISAT).
[65] Carl Staelin,et al. lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.
[66] Rastislav Bodík,et al. Interaction cost: for when event counts just don't add up , 2004, IEEE Micro.