A prediction-based dynamic file assignment strategy for parallel file systems

An analysis model of file assignment and access are generalized.A load prediction model in parallel file systems is proposed.A prediction-based dynamic file assignment strategy (PDFA) is proposed.We evaluate the effectiveness of the proposed algorithms. Nowadays, the rapid development of the internet calls for a high performance file system, and a lot of efforts have already been devoted to the issue of assigning nonpartitioned files in a parallel file system with the aim of pursuing a prompt response to requests. Yet most of the existing strategies still fail to bring about an optimal performance on system mean response time metrics, and new strategies which can achieve better performance in terms of mean response time become indispensable for parallel file systems. This paper, while addressing the issue of assigning nonpartitioned files in parallel file systems where the file accesses exhibit Poisson arrival rates and fixed service times, presents an on-line file assignment strategy, named prediction-based dynamic file assignment (PDFA), to minimize the mean response time among disks under different workload conditions, and a comparison of the PDFA with the well-known file assignment algorithms, such as HP and SOR. Comprehensive experimental results show that PDFA is able to improve the performance consistently in terms of mean response time among all algorithms for comparison.

[1]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[2]  Robert Budden,et al.  Kerberized Lustre 2.0 over the WAN , 2010 .

[3]  Tao Chen,et al.  RSEDP: an effective hybrid data placement algorithm for large-scale storage systems , 2009, The Journal of Supercomputing.

[4]  Magnus Karlsson,et al.  Choosing replica placement heuristics for wide-area systems , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[5]  Yang Yu,et al.  A Balanced Allocation Strategy for File Assignment in Parallel I/O Systems , 2010, 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage.

[6]  KyoungSoo Park,et al.  CoMon: a mostly-scalable monitoring system for PlanetLab , 2006, OPSR.

[7]  Xiao Liu,et al.  A data placement strategy in scientific cloud workflows , 2010, Future Gener. Comput. Syst..

[8]  Tao Xie,et al.  SEA: A Striping-Based Energy-Aware Strategy for Data Placement in RAID-Structured Storage Systems , 2008, IEEE Transactions on Computers.

[9]  Xian-He Sun,et al.  A cost-intelligent application-specific data layout scheme for parallel file systems , 2011, HPDC '11.

[10]  Feng Wang,et al.  File System Workload Analysis For Large Scale Scientific Com puting Applications , 2004 .

[11]  Lawrence W. Dowdy,et al.  Comparative Models of the File Assignment Problem , 1982, CSUR.

[12]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[13]  Zahir Tari,et al.  Task assignment with work-conserving migration , 2006, Parallel Comput..

[14]  Yao Sun,et al.  A file assignment strategy independent of workload characteristic assumptions , 2009, TOS.

[15]  Ishfaq Ahmad,et al.  Continuous Replica Placement schemes in distributed systems , 2005, ICS '05.

[16]  Ayaz Isazadeh,et al.  PHFS: A dynamic replication method, to decrease access latency in the multi-tier data grid , 2011, Future Gener. Comput. Syst..

[17]  Akshat Verma,et al.  General store placement for response time minimization in parallel disks , 2007, J. Parallel Distributed Comput..

[18]  Krishna R. Pattipati,et al.  A file assignment problem model for extended local area network environments , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[19]  Robert Ross,et al.  Implementation and performance of a parallel file system for high performance distributed applications , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[20]  G. J. Janacek,et al.  Time series analysis forecasting and control , 2009 .

[21]  Benjamin W. Wah File Placement on Distributed Computer Systems , 1984, Computer.

[22]  Florin Isaila,et al.  Clusterfile: a flexible physical layout parallel file system , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[23]  Akshat Verma,et al.  On Store Placement for Response Time Minimization in Parallel Disks , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[24]  Jiwu Shu,et al.  SLAS: An efficient approach to scaling round-robin striped volumes , 2007, TOS.

[25]  Hee Yong Youn,et al.  Dynamic hybrid replication effectively combining tree and grid topology , 2010, The Journal of Supercomputing.

[26]  Xian-He Sun,et al.  Data layout optimization for petascale file systems , 2009, PDSW '09.

[27]  Alan Jay Smith,et al.  The automatic improvement of locality in storage systems , 2005, TOCS.

[28]  Kang G. Shin,et al.  FS2: dynamic data replication in free disk space for improving disk performance and energy consumption , 2005, SOSP '05.

[29]  Sampath Rangarajan,et al.  Data distribution algorithms for load balanced fault-tolerant Web access , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[30]  Peter Scheuermann,et al.  File Assignment in Parallel I/O Systems with Minimal Variance of Service Time , 2000, IEEE Trans. Computers.

[31]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[32]  Vagelis Hristidis,et al.  BORG: Block-reORGanization for Self-optimizing Storage Systems , 2009, FAST.

[33]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[34]  Bin Tang,et al.  Data Replication in Data Intensive Scientific Applications with Performance Guarantee , 2011, IEEE Transactions on Parallel and Distributed Systems.

[35]  Tien-Fu Chen,et al.  Variable-size data item placement for load and storage balancing , 2003, J. Syst. Softw..

[36]  Samuel Lang,et al.  A Segment-Level Adaptive Data Layout Scheme for Improved Load Balance in Parallel File Systems , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[37]  Stavros Christodoulakis,et al.  Optimal Data Placement on Disks: A Comprehensive Solution for Different Technologies , 2000, IEEE Trans. Knowl. Data Eng..