Computer Architecture Performance Evaluation Methods

Performance evaluation is at the foundation of computer architecture research and development. Contemporary microprocessors are so complex that architects cannot design systems based on intuition and simple models only. Adequate performance evaluation methods are absolutely crucial to steer the research and development process in the right direction. However, rigorous performance evaluation is non-trivial as there are multiple aspects to performance evaluation, such as picking workloads, selecting an appropriate modeling or simulation approach, running the model and interpreting the results using meaningful metrics. Each of these aspects is equally important and a performance evaluation method that lacks rigor in any of these crucial aspects may lead to inaccurate performance data and may drive research and development in a wrong direction. The goal of this book is to present an overview of the current state-of-the-art in computer architecture performance evaluation, with a special emphasis on methods for exploring processor architectures. The book focuses on fundamental concepts and ideas for obtaining accurate performance data. The book covers various topics in performance evaluation, ranging from performance metrics, to workload selection, to various modeling approaches including mechanistic and empirical modeling. And because simulation is by far the most prevalent modeling technique, more than half the book’s content is devoted to simulation. The book provides an overview of the simulation techniques in the computer designer’s toolbox, followed by various simulation acceleration techniques including sampled simulation, statistical simulation, parallel simulation and hardware-accelerated simulation.

[1]  Paolo Faraboschi,et al.  An Adaptive Synchronization Technique for Parallel Simulation of Networked Clusters , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[2]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[3]  S. Hadjitodorov,et al.  Empirical versus mechanistic modelling: Comparison of an artificial neural network to a mechanistically based model for quantitative structure pharmacokinetic relationships of a homologous series of barbiturates , 1999, AAPS PharmSci.

[4]  Krste Asanovic,et al.  Branch trace compression for snapshot-based simulation , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[5]  Edward M. Riseman,et al.  The Inhibition of Potential Parallelism by Conditional Jumps , 1972, IEEE Transactions on Computers.

[6]  Dam Sunwoo,et al.  Accurate Functional-First Multicore Simulators , 2009, IEEE Computer Architecture Letters.

[7]  Olivier Temam,et al.  UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development , 2007, IEEE Computer Architecture Letters.

[8]  James E. Smith,et al.  Characterizing computer performance with a single number , 1988, CACM.

[9]  Sally A. McKee,et al.  Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.

[10]  David A. Wood,et al.  Full-system timing-first simulation , 2002, SIGMETRICS '02.

[11]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[12]  Babak Falsafi,et al.  Modeling cost/performance of a parallel computer simulator , 1997, TOMC.

[13]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[14]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[15]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[16]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[17]  Lizy Kurian John,et al.  Improved automatic testcase synthesis for performance model validation , 2005, ICS '05.

[18]  Erik Hagersten,et al.  Fast data-locality profiling of native execution , 2005, SIGMETRICS '05.

[19]  Lieven Eeckhout,et al.  Considering all starting points for simultaneous multithreading simulation , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[20]  Lieven Eeckhout,et al.  How java programs interact with virtual machines at the microarchitectural level , 2003, OOPSLA 2003.

[21]  M. Hill,et al.  Optimistic Simulation of Parallel Architectures Using Program Executables , 1996, Proceedings of Symposium on Parallel and Distributed Tools.

[22]  Thomas F. Wenisch,et al.  Simulation sampling with live-points , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[23]  Stijn Eyerman,et al.  Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[24]  Peter K. Szwed,et al.  SimSnap: fast-forwarding via native execution and application-level checkpointing , 2004, Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004..

[25]  Paolo Faraboschi,et al.  COTSon: infrastructure for full system simulation , 2009, OPSR.

[26]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[27]  Lieven Eeckhout,et al.  Hybrid analytical-statistical modeling for efficiently exploring architecture and workload design spaces , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[28]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[29]  John Paul Shen,et al.  A framework for statistical modeling of superscalar processor performance , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[30]  Mikko H. Lipasti,et al.  Modern Processor Design: Fundamentals of Superscalar Processors , 2002 .

[31]  Douglas M. Hawkins,et al.  A statistically rigorous approach for improving simulation methodology , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[32]  Santosh G. Abraham,et al.  Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.

[33]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[34]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[35]  Dam Sunwoo,et al.  QUICK: A flexible full-system functional model , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[36]  David A. Patterson,et al.  RAMP: research accelerator for multiple processors - a community vision for a shared experimental parallel HW/SW platform , 2006, ISPASS.

[37]  Lieven Eeckhout,et al.  BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation , 2005, Comput. J..

[38]  Jean-Loup Baer,et al.  Trace Sampling for Desktop Applications on Windows NT , 2000 .

[39]  Jean-Loup Baer,et al.  On the use of trace sampling for architectural studies of desktop applications , 1999, SIGMETRICS '99.

[40]  Lieven Eeckhout,et al.  Branch Predictor Warmup for Sampled Simulation through Branch History Matching , 2009, Trans. High Perform. Embed. Archit. Compil..

[41]  Lieven Eeckhout,et al.  Control flow modeling in statistical simulation for accurate and efficient processor design studies , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[42]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[43]  Doug Burger,et al.  Measuring Experimental Error in Microprocessor Simulation , 2001, ISCA 2001.

[44]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[45]  Jianwei Chen,et al.  SlackSim: a platform for parallel simulations of CMPs on CMPs , 2009, CARN.

[46]  Lieven Eeckhout,et al.  Statistically rigorous java performance evaluation , 2007, OOPSLA.

[47]  Eric E. Johnson,et al.  Lossless Trace Compression , 2001, IEEE Trans. Computers.

[48]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[49]  Massoud Pedram,et al.  Microprocessor power estimation using profile-driven program synthesis , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[50]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[51]  Lieven Eeckhout,et al.  Measuring benchmark similarity using inherent program characteristics , 2006, IEEE Transactions on Computers.

[52]  Kapil Vaswani,et al.  A Predictive Performance Model for Superscalar Processors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[53]  Kapil Vaswani,et al.  Construction and use of linear regression models for processor performance analysis , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[54]  David M. Brooks,et al.  Efficiency trends and limits from comprehensive microarchitectural adaptivity , 2008, ASPLOS.

[55]  L. Eeckhout,et al.  Exploiting program microarchitecture independent characteristics and phase behavior for reduced benchmark suite simulation , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[56]  James E. Smith,et al.  Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox , 2003, IEEE Micro.

[57]  Brad Calder,et al.  Reproducible simulation of multi-threaded workloads for architecture design exploration , 2008, 2008 IEEE International Symposium on Workload Characterization.

[58]  Theodore Antonakopoulos,et al.  An Instruction Throughput Model of Superscalar Processors , 2003 .

[59]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[60]  Arvind,et al.  Quick Performance Models Quickly: Closely-Coupled Partitioned Simulation on FPGAs , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[61]  D. Patterson,et al.  Performance characterization of a quad Pentium Pro SMP using OLTP workloads , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[62]  Lieven Eeckhout,et al.  Automated microprocessor stressmark generation , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[63]  David M. Brooks,et al.  CPR: Composable performance regression for scalable multiprocessor models , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[64]  Harvey G. Cragon Computer Architecture and Implementation , 2000 .

[65]  Janak H. Patel,et al.  Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.

[66]  Lieven Eeckhout,et al.  Efficient Sampling Startup for SimPoint , 2006, IEEE Micro.

[67]  D. Citron MisSPECulation: partial and misleading use of spec CPU2000 in computer architecture conferences , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[68]  Dam Sunwoo,et al.  FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.

[69]  Sarita V. Adve,et al.  Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[70]  Aamer Jaleel,et al.  Cross Binary Simulation Points , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[71]  T. Puzak,et al.  The optimum pipeline depth for a microprocessor , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[72]  Lieven Eeckhout,et al.  Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.

[73]  Brian Fahs,et al.  Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[74]  Philip G. Emma,et al.  Understanding some simple processor-performance limits , 1997, IBM J. Res. Dev..

[75]  Richard M. Fujimoto,et al.  Direct execution models of processor behavior and performance , 1987, WSC '87.

[76]  Brad Calder,et al.  Structures for phase classification , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[77]  Scott Devine,et al.  Using the SimOS machine simulator to study complex computer systems , 1997, TOMC.

[78]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[79]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[80]  W. Paul,et al.  Computer Architecture , 2000, Springer Berlin Heidelberg.

[81]  Olivier Temam,et al.  MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[82]  David I. August,et al.  Microarchitectural exploration with Liberty , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[83]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[84]  James E. Smith,et al.  Statistical simulation of symmetric multiprocessor systems , 2002, Proceedings 35th Annual Simulation Symposium. SS 2002.

[85]  Lieven Eeckhout,et al.  Distilling the essence of proprietary workloads into miniature benchmarks , 2008, TACO.

[86]  James R. Larus,et al.  Fast out-of-order processor simulation using memoization , 1998, ASPLOS VIII.

[87]  Maged M. Michael,et al.  Accuracy and speed-up of parallel trace-driven architectural simulation , 1997, Proceedings 11th International Parallel Processing Symposium.

[88]  Tor M. Aamodt,et al.  Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[89]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.

[90]  Brad Calder,et al.  The Strong correlation Between Code Signatures and Performance , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[91]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[92]  Stijn Eyerman,et al.  System-level Performance Metrics for Multiprogram Workloads Assessing the Performance of Multiprogram Workloads Running on Multithreaded Hardware Is Difficult Because It Involves a Balance between Single-program Performance and Overall System Performance. This Article Argues for Developing Multiprog , 2008 .

[93]  James E. Smith,et al.  Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[94]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[95]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[96]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[97]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[98]  David I. August,et al.  Exploiting parallelism and structure to accelerate the simulation of chip multi-processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[99]  James E. Smith,et al.  Advanced Micro Devices , 2005 .

[100]  Michael F. P. O'Boyle,et al.  Microarchitectural Design Space Exploration Using an Architecture-Centric Approach , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[101]  Krste Asanovic,et al.  Accelerating Multiprocessor Simulation with a Memory Timestamp Record , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[102]  Dean M. Tullsen,et al.  Initial observations of the simultaneous multithreading Pentium 4 processor , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[103]  Louise Trevillyan,et al.  Representative traces for processor models with infinite cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[104]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[105]  John R. Mashey,et al.  War of the benchmark means: time for a truce , 2004, CARN.

[106]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[107]  Kevin Skadron,et al.  Accelerated warmup for sampled microarchitecture simulation , 2005, TACO.

[108]  Douglas W. Clark,et al.  A Characterization of Processor Performance in the vax-11/780 , 1984, ISCA '84.

[109]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[110]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[111]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[112]  Philip J. Fleming,et al.  How not to lie with statistics: the correct way to summarize benchmark results , 1986, CACM.

[113]  Josep Torrellas,et al.  A direct-execution framework for fast and accurate simulation of superscalar processors , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[114]  Lieven Eeckhout,et al.  Efficient Sampling Startup for Sampled Processor Simulation , 2005, HiPEAC.

[115]  Stijn Eyerman,et al.  Efficient Design Space Exploration of High Performance Embedded Out-of-Order Processors , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[116]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[117]  David A. Wood,et al.  Variability in architectural simulations of multi-threaded workloads , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[118]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[119]  G. Braun,et al.  A universal technique for fast and flexible instruction-set architecture simulation , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[120]  Satish Narayanasamy,et al.  Automatic logging of operating system effects to guide application-level architecture simulation , 2006, SIGMETRICS '06/Performance '06.

[121]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[122]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[123]  Gary Lauterbach Accelerating architectural simulation by parallel execution of trace samples , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[124]  Margaret Martonosi,et al.  Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques , 1999, IEEE Trans. Computers.

[125]  Lizy Kurian John,et al.  Efficiently Evaluating Speedup Using Sampled Processor Simulation , 2004, IEEE Computer Architecture Letters.

[126]  James E. Smith,et al.  Characterizing the branch misprediction penalty , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[127]  David J. Lilja,et al.  Measuring computer performance : A practitioner's guide , 2000 .

[128]  Tao Li,et al.  Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis , 2008, 2008 IEEE International Symposium on Workload Characterization.

[129]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[130]  Babak Falsafi,et al.  ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs , 2009, TRETS.

[131]  James E. Smith,et al.  A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.

[132]  Alan D. George,et al.  Parallel simulation of chip-multiprocessor architectures , 2002, TOMC.

[133]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[134]  Greg Hamerly,et al.  SimPoint 3.0: Faster and More Flexible Program Analysis , 2005 .

[135]  Tejas Karkhanis,et al.  A Day in the Life of a Data Cache Miss , 2002 .

[136]  David M. Brooks,et al.  Illustrative Design Space Studies with Microarchitectural Regression Models , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[137]  Albert Cohen,et al.  DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time , 2003, SIGMETRICS '03.

[138]  Lieven Eeckhout,et al.  Chip Multiprocessor Design Space Exploration through Statistical Simulation , 2009, IEEE Transactions on Computers.

[139]  Christopher J. Hughes,et al.  RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors , 2002, Computer.

[140]  Martin Burtscher,et al.  The VPC trace-compression algorithms , 2005, IEEE Transactions on Computers.

[141]  David A. Bader,et al.  BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[142]  Brad Calder,et al.  Selecting software phase markers with code structure analysis , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[143]  Lieven Eeckhout,et al.  Performance Evaluation and Benchmarking , 2005 .

[144]  Trevor N. Mudge,et al.  Intrinsic Checkpointing: A Methodology for Decreasing Simulation Time Through Binary Modification , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[145]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[146]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[147]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[148]  Martin Burtscher,et al.  Automatic Synthesis of High-Speed Processor Simulators , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[149]  Lieven Eeckhout,et al.  Evaluating Benchmark Subsetting Approaches , 2006, 2006 IEEE International Symposium on Workload Characterization.

[150]  Lieven Eeckhout,et al.  The exigency of benchmark and compiler drift: designing tomorrow's processors with yesterday's tools , 2006, ICS '06.

[151]  Lieven Eeckhout,et al.  Representative Multiprogram Workloads for Multithreaded Processor Simulation , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[152]  Lieven Eeckhout,et al.  Memory Data Flow Modeling in Statistical Simulation for the Efficient Exploration of Microprocessor Design Spaces , 2008, IEEE Transactions on Computers.

[153]  Ann Marie Grizzaffi Maynard,et al.  Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.

[154]  Lixin Zhang,et al.  Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.

[155]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[156]  Jeff Reilly "Evolve or Die: Making SPEC's CPU Suite Relevant Today and Tomorrow" , 2006, 2006 IEEE International Symposium on Workload Characterization.

[157]  Perry Cheng,et al.  Myths and realities: the performance impact of garbage collection , 2004, SIGMETRICS '04/Performance '04.

[158]  Lizy Kurian John,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007, ISCA '07.

[159]  Thomas F. Wenisch,et al.  Statistical sampling of microarchitecture simulation , 2006, IPDPS.

[160]  John Paul Shen,et al.  An integrated functional performance simulator , 1999, IEEE Micro.

[161]  Frederic T. Chong,et al.  HLS: combining statistical and symbolic simulation to guide microprocessor designs , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[162]  Lieven Eeckhout,et al.  SMA: A Self-Monitored Adaptive Cache Warm-Up Scheme for Microprocessor Simulation , 2005, International Journal of Parallel Programming.

[163]  Nikil D. Dutt,et al.  Instruction set compiled simulation: a technique for fast and flexible instruction set simulation , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[164]  André Seznec,et al.  Choosing representative slices of program execution for microarchitecture simulations: a preliminary , 2000 .

[165]  Lizy Kurian John,et al.  More on finding a single number to indicate overall performance of a benchmark suite , 2004, CARN.

[166]  Mikko H. Lipasti,et al.  Redeeming IPC as a performance metric for multithreaded programs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[167]  James E. Smith,et al.  Automated design of application specific superscalar processors: an analytical approach , 2007, ISCA '07.

[168]  Onur Mutlu,et al.  Understanding the effects of wrong-path memory references on processor performance , 2004, WMPI '04.

[169]  Lieven Eeckhout,et al.  Quantifying the Impact of Input Data Sets on Program Behavior and its Applications , 2003, J. Instr. Level Parallelism.

[170]  Douglas M. Hawkins,et al.  Characterizing and comparing prevailing simulation techniques , 2005, 11th International Symposium on High-Performance Computer Architecture.

[171]  Per Stenström,et al.  Enhancing Multiprocessor Architecture Simulation Speed Using Matched-Pair Comparison , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[172]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[173]  Edward S. Davidson,et al.  Performance evaluation of highly concurrent computers by deterministic simulation , 1976, CACM.

[174]  Thomas M. Conte,et al.  Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[175]  Joel S. Emer Accelerating architecture research , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[176]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[177]  Lieven Eeckhout,et al.  Performance analysis through synthetic trace generation , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[178]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[179]  Intel Corp,et al.  Virtualization Without Direct Execution or Jitting: Designing a Portable Virtual Machine Infrastructure , 2008 .

[180]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[181]  Stéphan Jourdan,et al.  Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[182]  David A. Wood,et al.  A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.

[183]  Joel Emer,et al.  AWB : The Asim Architect ' s Workbench , 2007 .

[184]  David A. Wood,et al.  IPC Considered Harmful for Multiprocessor Workloads , 2006, IEEE Micro.

[185]  Brad Calder,et al.  A co-phase matrix to guide simultaneous multithreading simulation , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[186]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[187]  Thomas M. Conte,et al.  Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation , 1998, IEEE Trans. Computers.