Improvements to the structural simulation toolkit

Designing supercomputer architectures and applications is becoming more difficult because of their increased size and complexity, because of new technologies, and due to new constraints such as power and thermal limits. The Structural Simulation Toolkit (SST) is an architectural simulation framework designed to assist in the design, evaluation, and optimization of High Performance Computing (HPC) architectures and applications. Its initial release included a parallel simulation core with a number of system component models. The SST has been expanded and improved in a number of ways. New memory, network, and processor models have been added, as well as new high-level system simulation capabilities. Also, scalability results are presented.

[1]  Bill Nitzberg,et al.  Noncontiguous Processor Allocation Algorithms for Mesh-Connected Multicomputers , 1997, IEEE Trans. Parallel Distributed Syst..

[2]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[3]  Sudhakar Yalamanchili,et al.  Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  R. M. Fujimoto,et al.  Parallel discrete event simulation , 1989, WSC '89.

[5]  Christopher R. Johnson,et al.  Backfilling with Guarantees Granted upon Job Submission , 2011, Euro-Par.

[6]  A. Varga,et al.  THE OMNET++ DISCRETE EVENT SIMULATION SYSTEM , 2003 .

[7]  David P. Bunde,et al.  Faster high-quality processor allocation. , 2010 .

[8]  Erik Brunvand,et al.  ASIM-An Asynchronous Architectural Level Simulator , 2004 .

[9]  David P. Bunde,et al.  Scheduling Restartable Jobs with Short Test Runs , 2009, JSSPP.

[10]  Jens Mache,et al.  Minimizing Message-Passing Contention in Fragmentation-Free Processor Allocation , 1997 .

[11]  Madhav V. Marathe,et al.  Compact Location Problems , 1993, Theor. Comput. Sci..

[12]  NitzbergBill,et al.  Noncontiguous Processor Allocation Algorithms for Mesh-Connected Multicomputers , 1997 .

[13]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[14]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .

[15]  C. Svensson,et al.  Improvement Potential and Equalization Example for Multidrop DRAM Memory Buses , 2009, IEEE Transactions on Advanced Packaging.

[16]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[17]  Pedro López,et al.  Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[18]  Ali G. Saidi,et al.  Performance Validation of Network-Intensive Workloads on a Full-System Simulator , 2005 .

[19]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[20]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[21]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[22]  Luca P. Carloni,et al.  PhoenixSim: A simulator for physical-layer analysis of chip-scale photonic interconnection networks , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[23]  Keith D. Underwood,et al.  Simulating Red Storm: Challenges and Successes in Building a System Simulation , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[24]  Karl S. Hemmert,et al.  Enabling Flexible Collective Communication Offload with Triggered Operations , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[25]  Sandia Report,et al.  The Portals 4.0 Message Passing Interface , 2008 .

[26]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[27]  Esther M. Arkin,et al.  Processor allocation on Cplant: achieving general processor locality using one-dimensional allocation strategies , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[28]  Sally Floyd,et al.  ns-3 project goals , 2006 .

[29]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.