Modeling a Leadership-Scale Storage System

Exascale supercomputers will have the potential for billion-way parallelism. While physical implementations of these systems are currently not available, HPC system designers can develop models of exascale systems to evaluate system design points. Modeling these systems and associated subsystems is a significant challenge. In this paper, we present the Co-design of Exascale Storage System (CODES) framework for evaluating exascale storage system design points. As part of our early work with CODES, we discuss the use of the CODES framework to simulate leadership-scale storage systems in a tractable amount of time using parallel discrete-event simulation. We describe the current storage system models and protocols included with the CODES framework and demonstrate the use of CODES through simulations of an existing petascale storage system.

[1]  Galen M. Shipman,et al.  The Spider Center Wide File System; From Concept to Reality , 2009 .

[2]  Laxmikant V. Kalé,et al.  Simulating Large Scale Parallel Applications Using Statistical Models for Sequential Execution Blocks , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.

[3]  Amy Apon,et al.  File system simulation: hierachical performance measurement and modeling , 2011 .

[4]  Christopher D. Carothers,et al.  Large-scale TCP models using optimistic parallel simulation , 2003, Seventeenth Workshop on Parallel and Distributed Simulation, 2003. (PADS 2003). Proceedings..

[5]  Murat Yuksel,et al.  Large-scale network simulation techniques: examples of TCP and OSPF models , 2003, CCRV.

[6]  T. Inglett,et al.  Designing a Highly-Scalable Operating System: The Blue Gene/L Story , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[7]  Carlos Maltzahn,et al.  Building a parallel file system simulator , 2009 .

[8]  Amy W. Apon,et al.  Hierarchical performance measurement and modeling of the linux file system , 2011, ICPE '11.

[9]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[10]  Renato Figueiredo,et al.  Towards simulation of parallel file system scheduling algorithms with PFSsim , 2011 .

[11]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[12]  R. J. Joenk,et al.  IBM journal of research and development: information for authors , 1978 .

[13]  Barbara Horner-Miller,et al.  Proceedings of the 2006 ACM/IEEE conference on Supercomputing , 2006 .

[14]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[15]  Kalyan S. Perumalla,et al.  μπ: a scalable and transparent system for simulating MPI programs , 2010, SimuTools.

[16]  Christopher D. Carothers,et al.  Scalable Time Warp on Blue Gene Supercomputers , 2009, 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation.

[17]  Christopher D. Carothers,et al.  Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation , 2011, 2011 IEEE Workshop on Principles of Advanced and Distributed Simulation.

[18]  James A. Ang,et al.  The Alliance for Computing at the Extreme Scale. , 2010 .

[19]  Bradley W. Settlemyer,et al.  A study of client-based caching for parallel i/o , 2009 .

[20]  Robert Latham,et al.  I/O performance challenges at leadership scale , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.