Orthrus: A Framework for Implementing Efficient Collective I/O in Multi-core Clusters

Optimization of access patterns using collective I/O imposes the overhead of exchanging data between processes. In a multi-core-based cluster the costs of inter-node and intra-node data communication are vastly different, and heterogeneity in the efficiency of data exchange poses both a challenge and an opportunity for implementing efficient collective I/O. The opportunity is to effectively exploit fast intra-node communication. We propose to improve communication locality for greater data exchange efficiency. However, such an effort is at odds with improving access locality for I/O efficiency, which can also be critical to collective-I/O performance. To address this issue we propose a framework, Orthrus, that can accommodate multiple collective-I/O implementations, each optimized for some performance aspects, and dynamically select the best performing one accordingly to current workload and system patterns. We have implemented Orthrus in the ROMIO library. Our experimental results with representative MPI-IO benchmarks on both a small dedicated cluster and a large production HPC system show that Orthrus can significantly improve collective I/O performance under various workloads and system scenarios.

[1]  Margaret H. Wright,et al.  The opportunities and challenges of exascale computing , 2010 .

[2]  Wei-keng Liao,et al.  Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols , 2008, HiPC 2008.

[3]  Gengbin Zheng,et al.  Achieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing , 2005 .

[4]  Robert B. Ross,et al.  Efficient structured data access in parallel file systems , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[5]  Edgar Gabriel,et al.  Empirical Optimization of Collective Communications with ADCL , 2010 .

[6]  Robert Ross,et al.  PVFS: parallel virtual file system , 2001 .

[7]  Rajeev Thakur,et al.  A New Data Sieving Approach for High Performance I/O , 2012 .

[8]  Wei-keng Liao,et al.  Evaluating I/O characteristics and methods for storing structured scientific data , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[9]  Dhabaleswar K. Panda,et al.  Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[10]  Seung Ryoul Maeng,et al.  An Efficient I/O Aggregator Assignment Scheme for Collective I/O Considering Processor Affinity , 2011, 2011 40th International Conference on Parallel Processing Workshops.

[11]  Samuel Williams,et al.  Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[12]  Siyuan Ma,et al.  A Source-aware Interrupt Scheduling for Modern Parallel I/O Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[13]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[14]  Robert B. Ross,et al.  A New Flexible MPI Collective I/O Implementation , 2006, 2006 IEEE International Conference on Cluster Computing.

[15]  Robert B. Ross,et al.  Noncontiguous I/O accesses through MPI-IO , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[16]  Song Jiang,et al.  YouChoose: A performance interface enabling convenient and efficient QoS support for consolidated storage systems , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[17]  Sadaf R. Alam,et al.  Characterization of Scientific Workloads on Systems with Multi-Core Processors , 2006, 2006 IEEE International Symposium on Workload Characterization.

[18]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Michael Wolfe,et al.  J+ = J , 1994, ACM SIGPLAN Notices.

[20]  Song Jiang,et al.  Making resonance a common case: A high-performance implementation of collective I/O on parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[21]  Edgar Gabriel,et al.  Automatically Selecting the Number of Aggregators for Collective I/O Operations , 2011, 2011 IEEE International Conference on Cluster Computing.

[22]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[23]  S.L. Johnsson CMSSL: a scalable scientific software library , 1993, Proceedings of Scalable Parallel Libraries Conference.

[24]  Xin Yuan,et al.  Processor affinity and MPI performance on SMP-CMP clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[25]  Wei-keng Liao,et al.  Noncontiguous access through MPI-IO , 2003 .

[26]  Victor C. M. Leung,et al.  Future Information Technology, Application, and Service , 2012 .