A Cost-Effective Distribution-Aware Data Replication Scheme for Parallel I/O Systems

As data volumes of high-performance computing applications continuously increase, low I/O performance becomes a fatal bottleneck of these data-intensive applications. Data replication is a promising approach to improve parallel I/O performance. However, most existing strategies are designed based on the assumption that contiguous requests are being served more efficiently than non-contiguous requests, which is not necessarily true in a parallel I/O system. The reason is that the multiple-server data distribution makes the favorable accesses between contiguous requests and non-contiguous ones indeterminate. In this study, we propose CEDA, a cost-effective distribution-aware data replication scheme to better support parallel I/O systems. As logical file access information is inefficient to make replication decisions in a parallel environment, CEDA considers physical data accesses on servers in both data selection and data placement during a parallel replication process. Specifically, CEDA first proposes a distribution-aware cost model to evaluate the file request time with a given data layout, and then it carries out cost-effective data replication based on replication benefit analysis. We have implemented CEDA as a part of the MPI I/O library in light of high portability on top of the OrangeFS file system. By replaying representative benchmarks and a real application, we collected comprehensive experimental results on both HDD- and SSD-based servers and conclude that CEDA can significantly improve parallel I/O system performance.

[1]  Chao Wang,et al.  Improving the availability of supercomputer job input data using temporal replication , 2009, Computer Science - Research and Development.

[2]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Jun Wang,et al.  MRAP: a novel MapReduce-based framework to support HPC analytics applications with access patterns , 2010, HPDC '10.

[4]  Usage Pattern-Driven Dynamic Data Layout Reorganization , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[5]  Song Jiang,et al.  InterferenceRemoval: removing interference of disk access for MPI programs through data replication , 2010, ICS '10.

[6]  Wei-keng Liao,et al.  Evaluating I/O characteristics and methods for storing structured scientific data , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[7]  Surendra Byna,et al.  Boosting Application-Specific Parallel I/O Optimization Using IOSIG , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[8]  Raju Rangaswami,et al.  I/O Deduplication: Utilizing content similarity to improve I/O performance , 2010, TOS.

[9]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Friedhelm Meyer auf der Heide,et al.  Dynamic and Redundant Data Placement , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[11]  Vagelis Hristidis,et al.  BORG: Block-reORGanization for Self-optimizing Storage Systems , 2009, FAST.

[12]  David R. Kaeli,et al.  Profile-guided I/O partitioning , 2003, ICS '03.

[13]  Kang G. Shin,et al.  FS2: dynamic data replication in free disk space for improving disk performance and energy consumption , 2005, SOSP '05.

[14]  Robert Latham,et al.  Parallel I/O in practice , 2006, SC.

[15]  Galen M. Shipman,et al.  LADS: Optimizing Data Transfers Using Layout-Aware Data Scheduling , 2015, FAST.

[16]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[17]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[18]  Xin Huang,et al.  A cost-aware region-level data placement scheme for hybrid parallel I/O systems , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[19]  Xian-He Sun,et al.  HAS: Heterogeneity-Aware Selective Data Layout Scheme for Parallel File Systems on Hybrid Servers , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[20]  Robert B. Ross,et al.  RADAR: Runtime Asymmetric Data-Access Driven Scientific Data Replication , 2014, ISC.

[21]  Rajeev Thakur,et al.  Pattern-Direct and Layout-Aware Replication Scheme for Parallel I/O Systems , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[22]  Yang Wang,et al.  Heterogeneity-Aware Collective I/O for Parallel I/O Systems with Hybrid HDD/SSD Servers , 2017, IEEE Transactions on Computers.

[23]  Karan Gupta,et al.  GPFS-SNC: An enterprise storage framework for virtual-machine clouds , 2011, IBM J. Res. Dev..

[24]  Xian-He Sun,et al.  S4D-Cache: Smart Selective SSD Cache for Parallel I/O Systems , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[25]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[26]  Yang Wang,et al.  Boosting Parallel File System Performance via Heterogeneity-Aware Selective Data Layout , 2016, IEEE Transactions on Parallel and Distributed Systems.

[27]  Xian-He Sun,et al.  A cost-intelligent application-specific data layout scheme for parallel file systems , 2011, HPDC '11.

[28]  T.M. Madhyastha,et al.  Exploiting Global Input Output Access Pattern Classification , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[29]  Ibrahim F. Haddad,et al.  PVFS: A Parallel Virtual File System for Linux Clusters , 2000 .

[30]  Yang Liu,et al.  Automatic identification of application I/O signatures from noisy server-side traces , 2014, FAST.

[31]  Yang Wang,et al.  HARL: Optimizing Parallel File Systems with Heterogeneity-Aware Region-Level Data Layout , 2017, IEEE Transactions on Computers.

[32]  André Brinkmann,et al.  Redundant Data Placement Strategies for Cluster Storage Environments , 2008, OPODIS.

[33]  Jun He,et al.  Pattern-aware file reorganization in MPI-IO , 2011, PDSW '11.

[34]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[35]  Yang Wang,et al.  Improving Performance of Parallel I/O Systems through Selective and Layout-Aware SSD Cache , 2016, IEEE Transactions on Parallel and Distributed Systems.