Performance Debugging and Tuning using an Instruction-Set Simulator

Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems. In principle, instruction set simulation can model any target computer and gather any statistic. Furthermore, such simulators are usually portable, independent of compiler tools, and deterministic-allowing bugs to be recreated or measurements repeated. Though often viewed as being too slow for use as a general programming tool, in the last several years their performance has improved considerably. We describe SIMICS, an instruction set simulator of SPARC-based multiprocessors developed at SICS, in its role as a general programming tool. We discuss some of the benefits of using a tool such as SIMICS to support various tasks in software engineering, including debugging, testing, analysis, and performance tuning. We present in some detail two test cases, where we''ve used SimICS to support analysis and performance tuning of two applications, Penny and EQNTOTT. This work resulted in improved parallelism in, and understanding of, Penny, as well as a performance improvement for EQNTOTT of over a magnitude. We also present some early work on analyzing SPARC/Linux, demonstrating the ability of tools like SimICS to analyze operating systems. (NOTE: A later version of this report was published in ILPS''97)

[1]  Johan Montelius,et al.  Exploiting fine-grain parallelism in concurrent constraint languages , 1997, Uppsala theses in computing science.

[2]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[3]  Peter Magnusson Partial Translation , 1993 .

[4]  Mats Brorsson SM-prof: a tool to visualise and find cache coherence performance bottlenecks in multiprocessor programs , 1995, SIGMETRICS '95/PERFORMANCE '95.

[5]  Peter S. Magnusson A Design for Efficient Simulation of a Multiprocessor , 1993, MASCOTS.

[6]  Michael Iles,et al.  Using Simulation to Develop and Port Software , 1992, Digit. Tech. J..

[7]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[8]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[9]  Stephen R. Goldschmidt,et al.  Simulation of multiprocessors: accuracy and performance , 1993 .

[10]  Peter S. Magnusson,et al.  Efficient memory simulation in SimICS , 1995, Proceedings of Simulation Symposium.

[11]  Marc Atkins,et al.  PC Software Performance Tuning , 1996, Computer.

[12]  Robert C. Bedichek Talisman: fast and accurate multicomputer simulation , 1995, SIGMETRICS '95/PERFORMANCE '95.

[13]  Mendel Rosenblum,et al.  Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[14]  Anoop Gupta,et al.  Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[15]  David Keppel,et al.  Shade: a fast instruction-set simulator for execution profiling , 1994, SIGMETRICS.

[16]  Peter S. Magnusson Efficient instruction cache simulation and execution profiling with a threaded-code interpreter , 1997, WSC '97.

[17]  James R. Bell,et al.  Threaded code , 1973, CACM.

[18]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[19]  S. Gill,et al.  The diagnosis of mistakes in programmes on the EDSAC , 1951, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[20]  James R. Larus,et al.  Rewriting executable files to measure program behavior , 1994, Softw. Pract. Exp..

[21]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.