An Abstraction Methodology for the Evaluation of Multi-core Multi-threaded Architectures

As the evolution of multi-core multi-threaded processors continues, the complexity demanded to perform an extensive trade-off analysis, increases proportionally. Cycle-accurate or trace-driven simulators are too slow to execute the large amount of experiments required to obtain indicative results. To achieve a thorough analysis of the system, software benchmarks or traces are required. In many cases when an analysis is needed most, during the earlier stages of the processor design, benchmarks or traces are not available. Analytical models overcome these limitations but do not provide the fine grain details needed for a deep analysis of these architectures. In this work we present a new methodology to abstract processor architectures, at a level between cycle-accurate and analytical simulators. To apply our methodology we use queueing modeling techniques. Thus, we introduce Q-MAS, a queueing based tool targeting a real chip (the Ultra SPARC T2 processor) and aimed at facilitating the quanti?cation of trade-offs during the design phase of multi-core multi-threaded processor architectures. The results demonstrate that Q-MAS, the tool that we developed, provides accurate results very close to the actual hardware, with a minimal cost of running what-if scenarios.

[1]  Francisco J. Cazorla,et al.  FAME: FAirly MEasuring Multithreaded Architectures , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[2]  Vidhyacharan Bhaskar,et al.  A closed queuing network model with multiple servers for multi-threaded architecture , 2008, Comput. Commun..

[3]  Tor M. Aamodt,et al.  A first-order fine-grained multithreaded throughput model , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[4]  Rafael Asenjo,et al.  Analytical Modeling of Pipeline Parallelism , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[5]  Stijn Eyerman,et al.  Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[6]  Lieven Eeckhout,et al.  Control flow modeling in statistical simulation for accurate and efficient processor design studies , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[7]  Vladimir Vlassov,et al.  Analytical modeling of multithreaded architectures , 2000, J. Syst. Archit..

[8]  Mauricio J. Serrano,et al.  Performance estimation of multistreamed, superscalar processors , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[9]  Thin-Fong Tsuei,et al.  Queuing Simulation Model for Multiprocessor Systems , 2003, Computer.

[10]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[11]  John Paul Shen,et al.  A framework for statistical modeling of superscalar processor performance , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[12]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[13]  James E. Smith,et al.  Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[14]  Lieven Eeckhout,et al.  Chip Multiprocessor Design Space Exploration through Statistical Simulation , 2009, IEEE Transactions on Computers.

[15]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[16]  Thomas M. Conte,et al.  Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[17]  Giuseppe Serazzi,et al.  JMT: performance engineering tools for system modeling , 2009, PERV.

[18]  David E. Culler,et al.  An Analytical Solution for a Markov Chain Modeling Multithreaded Execution , 1991 .

[19]  Brad Calder,et al.  A co-phase matrix to guide simultaneous multithreading simulation , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[20]  Theodore Antonakopoulos,et al.  An Instruction Throughput Model of Superscalar Processors , 2003 .

[21]  Francisco J. Cazorla,et al.  Characterizing the resource-sharing levels in the UltraSPARC T2 processor , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).