Transit: A Visual Analytical Model for Multithreaded Machines

With the extraordinary growth of cores and threads in today's multithreaded machines, analyzing and tuning the performance of such platforms becomes a challenging task. In this paper, we propose an intuitive and visualizable model for analyzing the performance of contemporary highly concurrent multithreaded machines. Based on flow balancing between service demand and service supply of the memory system, the model draws an intuitive figure to characterize machine state, identify bottlenecks and determine optimization directions. The tractability of the model is highlighted as it only requires two parameters from the workload. Our model achieves 90% and 83% prediction accuracy for computation throughput on Fermi and Kepler GPUs over the 16 applications from Rodinia benchmark.

[1]  Robert H. Halstead,et al.  Multithreaded Computer Architecture , 1994, The Kluwer International Series in Engineering and Computer Science.

[2]  P.P.-S. Chen,et al.  Queueing network model of interactive computing systems , 1975, Proceedings of the IEEE.

[3]  Gregory T. Byrd,et al.  Multithreaded processor architectures , 1995 .

[4]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[5]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[6]  Minxuan Zhang,et al.  Advanced Computer Architecture , 2016, Communications in Computer and Information Science.

[7]  Samuel Williams,et al.  Auto-tuning performance on multicore computers , 2008 .

[8]  Tor M. Aamodt,et al.  Modeling Cache Contention and Throughput of Multiprogrammed Manycore Processors , 2012, IEEE Transactions on Computers.

[9]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[10]  William J. Dally,et al.  GPUs and the Future of Parallel Computing , 2011, IEEE Micro.

[11]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[12]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[13]  Anant Agarwal,et al.  Performance Tradeoffs in Multithreaded Processors , 1992, IEEE Trans. Parallel Distributed Syst..

[14]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[15]  Kozo Kimura,et al.  An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[16]  Jarek Nieplocha,et al.  Evaluating the potential of multithreaded platforms for irregular scientific computations , 2007, CF '07.

[17]  Avi Mendelson,et al.  Many-Core vs. Many-Thread Machines: Stay Away From the Valley , 2009, IEEE Computer Architecture Letters.

[18]  Y. C. Tay,et al.  Analytical Performance Modeling for Computer Systems , 2010, Analytical Performance Modeling for Computer Systems.

[19]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[20]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.