The BlueGene/L pseudo cycle-accurate simulator

The design and development of a new computer system is a lengthy process, with a considerable amount of time elapsed between the beginning of development and first hardware availability. Hence, fast and reasonably accurate simulation of processor architecture has become critical as an enabling mechanism for software engineers to develop and tune system software and applications. In this paper, we present the time-stamped timing model extensions to the BlueGene/L functional simulator. These extensions were implemented to create a pseudo cycle-accurate simulator capable of providing tracing capabilities for detection of bottlenecks and for performance tuning of applications, before the actual hardware became available. Our validation tests, using the DAXPY kernel and the serial version of the NAS benchmarks, show that our pseudo cycle-accurate simulator provides timing information within 15% of the times measured using the actual BlueGene/L hardware. In addition, we present a couple of case studies, which describes how this simulator can be used for identification of performance bottlenecks and for application tuning.