Metadata And Data Management In High Performance File And Storage Systems

With the advent of emerging “e-Science” applications, today’s scientific research increasingly relies on petascale-and-beyond computing over large data sets of the same magnitude. While the computational power of supercomputers has recently entered the era of petascale, the performance of their storage system is far lagged behind by many orders of magnitude. This places an imperative demand on revolutionizing their underlying I/O systems, on which the management of both metadata and data is deemed to have significant performance implications. Prefetching/caching and data locality awareness optimizations, as conventional and effective management techniques for metadata and data I/O performance enhancement, still play their crucial roles in current parallel and distributed file systems. In this study, we examine the limitations of existing prefetching/caching techniques and explore the untapped potentials of data locality optimization techniques in the new era of petascale computing. For metadata I/O access, we propose a novel weighted-graph-based prefetching technique, built on both direct and indirect successor relationship, to reap performance benefit from prefetching specifically for clustered metadata serversan arrangement envisioned necessary for petabyte scale distributed storage systems.

[1]  David R. Kaeli,et al.  Profile-guided I/O partitioning , 2003, ICS '03.

[2]  Mahmut T. Kandemir,et al.  Improving I/O Performance of Applications through Compiler-Directed Code Restructuring , 2008, FAST.

[3]  Carla Schlatter Ellis,et al.  Practical prefetching techniques for parallel file systems , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[4]  Darrell D. E. Long,et al.  Design and Implementation of a Predictive File Prefetching Algorithm , 2001, USENIX Annual Technical Conference, General Track.

[5]  Ahmed Amer,et al.  A stochastic approach to file access prediction , 2003, SNAPI '03.

[6]  Feng Wang,et al.  File System Workload Analysis For Large Scale Scientific Com puting Applications , 2004 .

[7]  Carla Schlatter Ellis,et al.  File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[8]  Jim Griffioen,et al.  Reducing File System Latency using a Predictive Approach , 1994, USENIX Summer.

[9]  Darrell D. E. Long,et al.  The case for efficient file access pattern modeling , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[10]  Honesty C. Young,et al.  An intelligent I-cache prefetch mechanism , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.

[11]  Hui Lei,et al.  An analytical approach to file prefetching , 1997 .

[12]  Todd C. Mowry,et al.  Automatic Compiler-Inserted Prefetching for Pointer-Based Applications , 1999, IEEE Trans. Computers.

[13]  Xiaotong Zhuang,et al.  Reducing Cache Pollution via Dynamic Data Prefetch Filtering , 2007, IEEE Transactions on Computers.

[14]  Randal C. Burns,et al.  Group-based management of distributed file caches , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[15]  Yoon-Young Lee,et al.  Table-comparison prefetching in VIA-based parallel file system , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[16]  Michel Dubois,et al.  Compiler Controlled Prefetching for Multiprocessors Using Low-Overhead Traps and Prefetch Engines , 2000, J. Parallel Distributed Comput..

[17]  P. Krishnan,et al.  Practical prefetching via data compression , 1993 .

[18]  Daniel A. Reed,et al.  Learning to Classify Parallel Input/Output Access Patterns , 2002, IEEE Trans. Parallel Distributed Syst..

[19]  D. Sánchez-Portal,et al.  The SIESTA method for ab initio order-N materials simulation , 2001, cond-mat/0111138.

[20]  Doron Rotem,et al.  Optimal File-Bundle Caching Algorithms for Data-Grids , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[21]  Mahmut T. Kandemir,et al.  Multicollective I/O: A technique for exploiting inter-file access patterns , 2006, TOS.

[22]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[23]  Darrell D. E. Long,et al.  Noah: low-cost file access prediction through pairs , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).

[24]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[25]  Mostafa H. Ammar,et al.  A novel multicast scheduling scheme for multimedia servers with variable access patterns , 2003, IEEE International Conference on Communications, 2003. ICC '03..

[26]  Dror G. Feitelson,et al.  Mpi-io: a parallel file i/o interface for mpi , 1995 .

[27]  Randal E. Bryant,et al.  Data-Intensive Supercomputing: The case for DISC , 2007 .

[28]  Ibrahim F. Haddad,et al.  PVFS: A Parallel Virtual File System for Linux Clusters , 2000 .

[29]  Anna R. Karlin,et al.  Implementation and performance of integrated application-controlled file caching, prefetching, and disk scheduling , 1996, TOCS.

[30]  Xiaoning Ding,et al.  DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch , 2007, USENIX Annual Technical Conference.

[31]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .

[32]  Dhabaleswar K. Panda,et al.  High performance support of parallel virtual file system (PVFS2) over Quadrics , 2005, ICS '05.

[33]  Wilhelm Anacker,et al.  Performance Evaluation of Computing Systems with Memory Hierarchies , 1967, IEEE Trans. Electron. Comput..

[34]  Vaidy S. Sunderam,et al.  Characterizing Concurrency Control Performance for the PIOUS Parallel File System , 1996, J. Parallel Distributed Comput..

[35]  John H. Hartman,et al.  The Zebra striped network file system , 1995, TOCS.

[36]  Chaoli Wang,et al.  LOD Map - A Visual Interface for Navigating Multiresolution Volume Visualization , 2006, IEEE Transactions on Visualization and Computer Graphics.

[37]  Hong Jiang,et al.  Hierarchical Bloom filter arrays (HBA): a novel, scalable metadata management system for large cluster-based storage , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[38]  GhemawatSanjay,et al.  The Google file system , 2003 .

[39]  Dhabaleswar K. Panda,et al.  PVFS over InfiniBand: design and performance evaluation , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[40]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[41]  Frank Shorter Design and Analysis of a Performance Evaluation Standard for Parallel File Systems , 2003 .

[42]  Garth A. Gibson,et al.  Exposing I/O concurrency with informed prefetching , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[43]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[44]  Mahadev Satyanarayanan,et al.  A status report on research in transparent informed prefetching , 1993, OPSR.

[45]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[46]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[47]  Andrew J. Hutton,et al.  Lustre: Building a File System for 1,000-node Clusters , 2003 .

[48]  Alok N. Choudhary,et al.  Implementation and evaluation of prefetching in the Intel Paragon parallel file system , 1996, Proceedings of International Conference on Parallel Processing.

[49]  Michael Dahlin,et al.  Cooperative caching: using remote client memory to improve file system performance , 1994, OSDI '94.

[50]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[51]  Todd C. Mowry,et al.  Automatic compiler-inserted I/O prefetching for out-of-core applications , 1996, OSDI '96.

[52]  Andrew A. Chien,et al.  PPFS: a high performance portable parallel file system , 1995, ICS '95.

[53]  Pete Wyckoff,et al.  File Creation Strategies in a Distributed Metadata File System , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[54]  Xiaoning Ding,et al.  A buffer cache management scheme exploiting both temporal and spatial localities , 2007, TOS.

[55]  Hong Jiang,et al.  Nexus: a novel weighted-graph-based prefetching algorithm for metadata servers in petabyte-scale storage systems , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[56]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[57]  Thomas E. Anderson,et al.  A Comparison of File System Workloads , 2000, USENIX Annual Technical Conference, General Track.

[58]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[59]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[60]  Erik Riedel,et al.  A Framework for Evaluating Storage System Security , 2002, FAST.

[61]  Kwan-Liu Ma,et al.  Visualizing Multivariate Volume Data from Turbulent Combustion Simulations , 2007, Computing in Science & Engineering.

[62]  Christopher Small,et al.  Why does file system prefetching work? , 1999, USENIX Annual Technical Conference, General Track.

[63]  Carla Schlatter Ellis,et al.  Practical prefetching techniques for multiprocessor file systems , 2005, Distributed and Parallel Databases.