Route-aware independent MPI I/O on the blue gene/Q

Scalable high-performance I/O is crucial for application performance on large-scale systems. With the growing complexity of the system interconnects, it has become important to consider the impact of network contention on I/O performance because the I/O messages traverse several hops in the interconnect before reaching the I/O nodes or the file system. In this work, we present a route-aware and load-aware algorithm to modify existing bridge node assignment in the Blue Gene/Q (BG/Q) supercomputer. We reduce the network contention and reduce the write time by an average of 60% over the default independent I/O and by 20% over collective I/O on up to 8192 nodes on the Mira BG/Q system. Our algorithm routes 1.4x fewer messages through the bridge nodes which connect to the I/O nodes on the BG/Q.

[1]  Scott Klasky,et al.  Using MPI file caching to improve parallel write performance for large-scale scientific applications , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[2]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[3]  Andrew J. Hutton,et al.  Lustre: Building a File System for 1,000-node Clusters , 2003 .

[4]  Robert Latham,et al.  Understanding and improving computational science storage access through continuous characterization , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[5]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[6]  Ibm Redbooks,et al.  IBM System Blue Gene Solution: Blue Gene/P Application Development , 2009 .

[7]  Robert Latham,et al.  24/7 Characterization of petascale I/O workloads , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[8]  Amith R. Mamidala,et al.  Looking under the hood of the IBM Blue Gene/Q network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Karsten Schwan,et al.  Adaptable, metadata rich IO methods for portable high performance IO , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[10]  Leonid Oliker,et al.  Parallel I/O performance: From events to ensembles , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[11]  Dirk Pleiter,et al.  Using GPFS to Manage NVRAM-Based Storage Cache , 2013, ISC.

[12]  Robert Latham,et al.  Production I / O Characterization on the Cray XE 6 , 2013 .

[13]  Philip Heidelberger,et al.  The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Hai Jin,et al.  Iteration Based Collective I/O Strategy for Parallel I/O Systems , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[15]  Dhabaleswar K. Panda,et al.  Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework , 2012, 2012 IEEE International Conference on Cluster Computing.

[16]  Philip Heidelberger,et al.  Optimization of All-to-All Communication on the Blue Gene/L Supercomputer , 2008, 2008 37th International Conference on Parallel Processing.

[17]  Venkatram Vishwanath,et al.  Hierarchical Read–Write Optimizations for Scientific Applications with Multi-variable Structured Datasets , 2017, International Journal of Parallel Programming.

[18]  Pavan Balaji,et al.  Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems , 2011, Computer Science - Research and Development.

[19]  Zhe Zhang,et al.  Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems , 2011, 30th IEEE International Performance Computing and Communications Conference.

[20]  Robert B. Ross,et al.  On the role of burst buffers in leadership-class storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[21]  Michael Gschwind,et al.  The IBM Blue Gene/Q Compute Chip , 2012, IEEE Micro.

[22]  Katherine E. Isaacs,et al.  There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[23]  Robert Latham,et al.  High performance file I/O for the Blue Gene/L supercomputer , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[24]  Rajeev Thakur,et al.  LACIO: A New Collective I/O Strategy for Parallel I/O Systems , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[25]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[26]  Wei-keng Liao,et al.  A case study for scientific I/O: improving the FLASH astrophysics code , 2012 .