Summarizing multiprocessor program execution with versatile, microarchitecture-independent snapshots
暂无分享,去创建一个
[1] Krste Asanovic,et al. Accelerating Multiprocessor Simulation with a Memory Timestamp Record , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[2] Louise Trevillyan,et al. Representative traces for processor models with infinite cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[3] Aleksandar Milenkovic,et al. Demystifying Intel Branch Predictors , 2005 .
[4] Luis Ceze,et al. Full Circle: Simulating Linux Clusters on Linux Clusters , 2003 .
[5] Amir Roth,et al. DISE: a programmable macro engine for customizing applications , 2003, ISCA '03.
[6] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[7] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[8] Michael D. Smith,et al. Tracing with Pixie , 1991 .
[9] Ian H. Witten,et al. Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..
[10] Martin Burtscher,et al. Compressing extended program traces using value predictors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[11] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[12] Alan Jay Smith,et al. A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.
[13] Douglas W. Clark,et al. A Characterization of Processor Performance in the vax-11/780 , 1984, ISCA '84.
[14] Alaa R. Alameldeen,et al. Addressing Workload Variability in Architectural Simulations , 2003, IEEE Micro.
[15] Gary Peterson,et al. UltraSPARC-I , 1995, DAC '95.
[16] S. McFarling. Combining Branch Predictors , 1993 .
[17] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[18] Babak Falsafi,et al. ProtoFlex: Co-simulation for Component-wise FPGA Emulator Development , 2006 .
[19] Thomas F. Wenisch,et al. SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture , 2004, PERV.
[20] Alan Jay Smith,et al. Efficient Analysis of Caching Systems , 1987 .
[21] Margaret Martonosi,et al. Speculative Updates of Local and Global Branch History: A Quantitative Analysis , 2000, J. Instr. Level Parallelism.
[22] David A. Wood,et al. A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.
[23] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[24] K. Kavi. Cache Memories Cache Memories in Uniprocessors. Reading versus Writing. Improving Performance , 2022 .
[25] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[26] Jeanine Cook,et al. Fast, Accurate Microarchitecture Simulation Using Statistical Phase Detection , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[27] Brad Calder,et al. A co-phase matrix to guide simultaneous multithreading simulation , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.
[28] James Archibald,et al. BACH: BYU Address Collection Hardware, The Collection of Complete Traces , 1992 .
[29] Gabriel H. Loh. Revisiting the performance impact of branch predictor latencies , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[30] Lixin Zhang,et al. Mambo: a full system simulator for the PowerPC architecture , 2004, PERV.
[31] Thomas M. Conte,et al. Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation , 1998, IEEE Trans. Computers.
[32] Dmitry A. Shkarin,et al. PPM: one step to practicality , 2002, Proceedings DCC 2002. Data Compression Conference.
[33] Sarita V. Adve,et al. Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[34] R. L. Sites,et al. ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.
[35] Martin Burtscher,et al. Automatic generation of high-performance trace compressors , 2005, International Symposium on Code Generation and Optimization.
[36] James R. Goodman,et al. Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, MICRO.
[37] Frederic T. Chong,et al. HLS: combining statistical and symbolic simulation to guide microprocessor designs , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[38] Lieven Eeckhout,et al. Considering all starting points for simultaneous multithreading simulation , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[39] Lieven Eeckhout,et al. Efficient Sampling Startup for Sampled Processor Simulation , 2005, HiPEAC.
[40] Larry Rudolph,et al. Cooperative checkpointing: a robust approach to large-scale systems reliability , 2006, ICS '06.
[41] Laxmikant V. Kalé,et al. BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[42] Eriko Nurvitadhi,et al. Design, implementation, and verification of active cache emulator (ACE) , 2006, FPGA '06.
[43] Thomas F. Wenisch,et al. Simulation sampling with live-points , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[44] Sandhya Dwarkadas,et al. Efficient Simulation of Parallel Computer Systems , 1991, Int. J. Comput. Simul..
[45] Andrew R. Cherenson,et al. The Sprite network operating system , 1988, Computer.
[46] Mark Horowitz,et al. Cache performance of operating system and multiprogramming workloads , 1988, TOCS.
[47] Onur Mutlu,et al. Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..
[48] André Seznec,et al. Choosing representative slices of program execution for microarchitecture simulations: a preliminary , 2000 .
[49] Anant Agarwal,et al. Blocking: exploiting spatial locality for trace compaction , 1990, SIGMETRICS '90.
[50] Josep Torrellas,et al. The Augmint multiprocessor simulation toolkit for Intel x86 architectures , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.
[51] Thomas F. Wenisch,et al. SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.
[52] Thomas F. Wenisch,et al. SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.
[53] Trevor N. Mudge,et al. Intrinsic Checkpointing: A Methodology for Decreasing Simulation Time Through Binary Modification , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[54] A. Dain Samples,et al. Mache: no-loss trace compaction , 1989, SIGMETRICS '89.
[55] Rajiv Kapoor,et al. Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[56] M. Milenkovic,et al. Exploiting streams in instruction and data address trace compression , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).
[57] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[58] Cong Fu,et al. The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors , 2005, CARN.
[59] Karel Driesen,et al. The cascaded predictor: economical and adaptive branch target prediction , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[60] G. Blelloch. Introduction to Data Compression * , 2022 .
[61] David R. Jefferson,et al. Virtual time , 1985, ICPP.
[62] James R. Larus,et al. The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.
[63] Daniel A. Jiménez,et al. Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[64] David A. Patterson,et al. Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .
[65] Per Stenström,et al. Enhancing Multiprocessor Architecture Simulation Speed Using Matched-Pair Comparison , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[66] Peter K. Szwed,et al. SimSnap: fast-forwarding via native execution and application-level checkpointing , 2004, Eighth Workshop on Interaction between Compilers and Computer Architectures, 2004. INTERACT-8 2004..
[67] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[68] David A. Patterson,et al. RAMP: research accelerator for multiple processors - a community vision for a shared experimental parallel HW/SW platform , 2006, ISPASS.
[69] Mikko H. Lipasti,et al. Redeeming IPC as a performance metric for multithreaded programs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[70] William T. C. Kramer,et al. Performance Variability of Highly Parallel Architectures , 2003, International Conference on Computational Science.
[71] Ming Wang,et al. Hardware emulation for functional verification of K5 , 1996, DAC '96.
[72] Fredrik Larsson,et al. SimGen: Development of Efficient Instruction Set Simulators , 1997 .
[73] Daniel A. Jiménez. Idealized Piecewise Linear Branch Prediction , 2005, J. Instr. Level Parallelism.
[74] Kevin Skadron,et al. Memory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation , 2002 .
[75] Derek Chiou,et al. FPGA-based Fast , Cycle-Accurate , Full-System Simulators , 2006 .
[76] Karel Driesen,et al. Accurate indirect branch prediction , 1998, ISCA.
[77] Thomas M. Conte,et al. Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.
[78] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[79] Richard Uhlig,et al. SoftSDV: A Presilicon Software Development Environment for the IA-64 Architecture , 1999 .
[80] Alan Jay Smith,et al. Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.
[81] Mendel Rosenblum,et al. Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.
[82] Rabin A. Sugumar,et al. Multi-configuration simulation algorithms for the evaluation of computer architecture designs , 1993 .
[83] Anoop Gupta,et al. Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..
[84] Michel Dubois,et al. The Design of RPM: An FPGA-based Multiprocessor Emulator , 1995, Third International ACM Symposium on Field-Programmable Gate Arrays.
[85] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[86] Andy D. Pimentel,et al. Distributed simulation of multicomputer architectures with Mermaid , 1997 .
[87] Peter K. Szwed,et al. Application-level checkpointing for shared memory programs , 2004, ASPLOS XI.
[88] Nathan L. Binkert,et al. Network-Oriented Full-System Simulation using M5 , 2003 .
[89] Yale N. Patt,et al. Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.
[90] Xidong Wang,et al. Lock Behavior Characterization of Commercial Workloads , 2002 .
[91] Janak H. Patel,et al. Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.
[92] James E. Smith,et al. Statistical simulation of symmetric multiprocessor systems , 2002, Proceedings 35th Annual Simulation Symposium. SS 2002.
[93] Brad Calder,et al. Loop Termination Prediction , 2000, ISHPC.
[94] Martin Burtscher,et al. The VPC trace-compression algorithms , 2005, IEEE Transactions on Computers.
[95] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[96] Laxmikant V. Kalé,et al. NAMD: Biomolecular Simulation on Thousands of Processors , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[97] Dan Tsafrir,et al. System noise, OS clock ticks, and fine-grained parallel applications , 2005, ICS '05.
[98] Jakob Engblom,et al. A FULLY VIRTUAL MULTI-NODE 1553 BUS COMPUTER SYSTEM , 2006 .
[99] J. Robert Jump,et al. The rice parallel processing testbed , 1988, SIGMETRICS '88.
[100] Pierre Michaud,et al. A PPM-like, Tag-based Predictor. , 2005 .
[101] D. Skinner,et al. Understanding the causes of performance variability in HPC workloads , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..
[102] Mikko H. Lipasti,et al. Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.
[103] Trevor N. Mudge,et al. Trace-driven memory simulation: a survey , 1997, CSUR.
[104] Gary Lauterbach. Accelerating architectural simulation by parallel execution of trace samples , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.
[105] Margaret Martonosi,et al. Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques , 1999, IEEE Trans. Computers.
[106] André Seznec,et al. Analysis of the O-GEometric history length branch predictor , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[107] Xiangyu Zhang,et al. Whole Execution Traces , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[108] A. J. KleinOsowski,et al. MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.
[109] Krste Asanovic,et al. Branch trace compression for snapshot-based simulation , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.
[110] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.
[111] M. Valero,et al. A Novel Evaluation Methodology to Obtain Fair Measurements in Multithreaded Architectures , 2006 .