论文信息 - GpuTejas: A parallel simulator for GPU architectures

GpuTejas: A parallel simulator for GPU architectures

In this paper, we introduce a new Java-based parallel GPGPU simulator, GpuTejas. GpuTejas is a fast trace driven simulator, which uses relaxed synchronization, and non-blocking data structures to derive its speedups. Secondly, it introduces a novel scheduling and partitioning scheme for parallelizing a GPU simulator. We evaluate the performance of our simulator with a set of Rodinia benchmarks. We demonstrate a mean speedup of 17.33x with 64 threads over sequential execution, and a speedup of 429X over the widely used simulator GPGPU-Sim. We validated our timing and simulation model by comparing our results with a native system (NVIDIA Tesla M2070). As compared to the sequential version of GpuTejas, the parallel version has an error limited to <;7.67% for our suite of benchmarks, which is similar to the numbers reported by competing parallel simulators.

Smruti R. Sarangi | Seep Goel | Geetika Malhotra

[1] Rodrigo A. Vivanco,et al. Scientific computing with Java and C++: a case study using functional magnetic resonance neuroimages , 2005, Softw. Pract. Exp..

[2] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.

[4] Samuel P. Midkiff,et al. Java programming for high-performance numerical computing , 2000, IBM Syst. J..

[5] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[6] David Defour,et al. Barra: A Parallel Functional Simulator for GPGPU , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[7] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[8] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[9] Won Woo Ro,et al. Parallel GPU architecture simulation framework exploiting work allocation unit parallelism , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[10] Carlos González,et al. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[11] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[12] Smruti R. Sarangi,et al. Lock-Free and Wait-Free Slot Scheduling Algorithms , 2016, IEEE Transactions on Parallel and Distributed Systems.

[13] David A. Wood,et al. gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.

[14] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15] Smruti R. Sarangi,et al. ParTejas , 2017, ACM Trans. Model. Comput. Simul..

[16] Andreas Moshovos,et al. Characterizing the performance benefits of fused CPU/GPU systems using FusionSim , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17] Rodrigo A. Vivanco,et al. Scientific computing with Java and Cpp: a case study using functional magnetic resonance neuroimages , 2005 .

[18] Kevin Skadron,et al. A reconfigurable simulator for large-scale heterogeneous multicore architectures , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[19] J. Mark Bull,et al. Benchmarking Java against C and Fortran for scientific applications , 2001, JGI '01.