Modeling Out-of-Order Superscalar Processor Performance Quickly and Accurately with Traces
暂无分享,去创建一个
[1] Mikko H. Lipasti,et al. Can trace-driven simulators accurately predict superscalar performance? , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.
[2] Yannis Smaragdakis,et al. Flexible reference trace reduction for VM simulations , 2003, TOMC.
[3] David B. Papworth. Tuning the Pentium Pro microarchitecture , 1996, IEEE Micro.
[4] R.H. Katz,et al. A characterization of sharing in parallel programs and its application to coherency protocol evaluation , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.
[5] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[6] Trevor N. Mudge,et al. Trace-driven memory simulation: a survey , 1997, CSUR.
[7] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[8] Sangyeun Cho,et al. Accurately approximating superscalar processor performance from traces , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[9] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[10] Pradeep Dubey,et al. Platform 2015: Intel ® Processor and Platform Evolution for the Next Decade , 2005 .
[11] Lieven Eeckhout,et al. Memory Data Flow Modeling in Statistical Simulation for the Efficient Exploration of Microprocessor Design Spaces , 2008, IEEE Transactions on Computers.
[12] Shunfei Chen,et al. MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).
[13] Kevin Skadron,et al. CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..
[14] Susan J. Eggers,et al. Techniques for efficient inline tracing on a shared-memory multiprocessor , 1990, SIGMETRICS '90.
[15] James R. Larus,et al. Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..
[16] David J. Lilja,et al. Measuring computer performance : A practitioner's guide , 2000 .
[17] Per Stenström,et al. Enhancing Multiprocessor Architecture Simulation Speed Using Matched-Pair Comparison , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[18] Gabriel H. Loh. A time-stamping algorithm for efficient performance estimation of superscalar processors , 2001, SIGMETRICS '01.
[19] Onur Mutlu,et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[20] Rastislav Bodík,et al. Focusing processor policies via critical-path prediction , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.
[21] Maurice V. Wilkes,et al. The memory wall and the CMOS end-point , 1995, CARN.
[22] Mike Johnson,et al. Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.
[23] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[24] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[25] Sangyeun Cho,et al. In-N-Out: Reproducing Out-of-Order Superscalar Processor Behavior from Reduced In-Order Traces , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.
[26] Fabrice Bellard,et al. QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX Annual Technical Conference, FREENIX Track.
[27] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[28] Sally A. McKee,et al. Hitting the memory wall: implications of the obvious , 1995, CARN.
[29] A. J. KleinOsowski,et al. MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.
[30] Philip Bitar,et al. A Critique of Trace-Driven Simulation for Shared-Memory Multiprocessors , 1990 .
[31] John L. Hennessy,et al. Efficient performance prediction for modern microprocessors , 2000, SIGMETRICS '00.
[32] Rami G. Melhem,et al. Scalable Multi-cache Simulation Using GPUs , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.
[33] Roland E. Wunderlich,et al. SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[34] Brian Fahs,et al. Microarchitecture optimizations for exploiting memory-level parallelism , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[35] Lizy Kurian John,et al. Synthesizing memory-level parallelism aware miniature clones for SPEC CPU2006 and ImplantBench workloads , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[36] James E. Smith,et al. The future of simulation: a field of dreams , 2006, Computer.
[37] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[38] James E. Smith,et al. Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox , 2003, IEEE Micro.
[39] John Paul Shen,et al. A framework for statistical modeling of superscalar processor performance , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[40] Mikko H. Lipasti,et al. Modern Processor Design: Fundamentals of Superscalar Processors , 2002 .
[41] Todd M. Austin,et al. SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.
[42] James E. Smith,et al. Statistical simulation of symmetric multiprocessor systems , 2002, Proceedings 35th Annual Simulation Symposium. SS 2002.
[43] Rastislav Bodík,et al. Interaction cost and shotgun profiling , 2004, TACO.
[44] Thomas F. Wenisch,et al. Simulation sampling with live-points , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[45] Thomas F. Wenisch,et al. SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.
[46] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[47] Leslie A. Barnes. Performance Modeling and Analysis for AMD's High Performance Microprocessors , 2007, ISPASS.
[48] Lieven Eeckhout,et al. Measuring benchmark similarity using inherent program characteristics , 2006, IEEE Transactions on Computers.
[49] Stéphan Jourdan,et al. An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors , 2004, International Journal of Parallel Programming.
[50] Tor M. Aamodt,et al. Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[51] Thomas F. Wenisch,et al. An Evaluation of Stratified Sampling of Microarchitecture Simulations , 2004 .
[52] Burzin A. Patel,et al. Optimization of instruction fetch mechanisms for high issue rates , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[53] James E. Smith,et al. A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[54] James E. Smith,et al. Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[55] James R. Larus,et al. Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[56] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[57] Kunle Olukotun,et al. Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.
[58] Sangyeun Cho,et al. Accurately modeling superscalar processor performance with reduced trace , 2013, J. Parallel Distributed Comput..
[59] Mayan Moudgill,et al. Environment for PowerPC microarchitecture exploration , 1999, IEEE Micro.
[60] John L. Hennessy,et al. The accuracy of trace-driven simulations of multiprocessors , 1993, SIGMETRICS '93.
[61] Anant Agarwal,et al. Blocking: exploiting spatial locality for trace compaction , 1990, SIGMETRICS '90.
[62] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[63] Louise Trevillyan,et al. Representative traces for processor models with infinite cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[64] Hyunjin Lee,et al. Two‐phase trace‐driven simulation (TPTS): a fast multicore processor architecture simulation approach , 2010, Softw. Pract. Exp..
[65] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[66] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[67] Michel Dubois,et al. Cache inclusion and processor sampling in multiprocessor simulations , 1993, SIGMETRICS '93.
[68] Pascal Sainrat,et al. Multiple-block ahead branch predictors , 1996, ASPLOS VII.
[69] James R. Larus,et al. Efficient program tracing , 1993, Computer.
[70] Stijn Eyerman,et al. Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[71] Sangyeun Cho. I-cache multi-banking and vertical interleaving , 2007, GLSVLSI '07.
[72] Lesley Anne Polka. Package Technology to Address the Memory Bandwidth Challenge for Terascale Computing , 2007 .
[73] James E. Smith,et al. Advanced Micro Devices , 2005 .
[74] Wen-Hann Wang,et al. Efficient trace-driven simulation methods for cache performance analysis , 1991, TOCS.
[75] K. Kavi. Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .
[76] Hyunjin Lee,et al. CloudCache: Expanding and shrinking private caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.