A Metadata-Rich File System

Despite continual improvements in the performance and reliability of large scale file systems, the management of file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address these problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, metadata, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS includes Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the defacto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.

[1]  Carlos Maltzahn,et al.  LiFS: An Attribute-Rich File System for Storage Class Memories , 2006 .

[2]  Shankar Pasupathy,et al.  Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems , 2009, FAST.

[3]  Alexander S. Szalay,et al.  Designing a multi-petabyte database for LSST , 2005, SPIE Astronomical Telescopes + Instrumentation.

[4]  Stuart Sechrest,et al.  Blending hierarchical and attribute-based file naming , 1992, [1992] Proceedings of the 12th International Conference on Distributed Computing Systems.

[5]  Ronald Minnich,et al.  Grave Robbers from Outer Space: Using 9P2000 Under Linux , 2005, USENIX Annual Technical Conference, FREENIX Track.

[6]  Jeffrey C Mogull Representing information about files , 1986, ICDCS 1986.

[7]  D. DeWitt MapReduce: A major step backwards | The Database Column , 2011 .

[8]  C. Mic Bowman,et al.  A File System for Information Management , 1994 .

[9]  Craig E. Wills,et al.  Experience with an interactive attribute-based user information environment , 1995, Proceedings International Phoenix Conference on Computers and Communications.

[10]  Carlos Maltzahn,et al.  Richer file system metadata using links and attributes , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[11]  Christopher Chute,et al.  The Diverse and Exploding Digital Universe , 2011 .

[12]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[13]  Maya Gokhale,et al.  Storage-Intensive Supercomputing Benchmark Study , 2007 .

[14]  Udi Manber,et al.  Integrating content-based access mechanisms with hierarchical file systems , 1999, OSDI '99.

[15]  Michael A. Olson,et al.  The Design and Implementation of the Inversion File System , 1993, USENIX Winter.

[16]  Lorrie Faith Cranor,et al.  Perspective: Semantic Data Management for the Home , 2009, FAST.

[17]  Eric A. Brewer,et al.  Cluster-based scalable network services , 1997, SOSP.

[18]  Jacek Becla,et al.  Lessons Learned from Managing a Petabyte , 2005, CIDR.

[19]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  David Maier,et al.  Principles of dataspace systems , 2006, PODS '06.

[22]  Pierre Jouvelot,et al.  Semantic file systems , 1991, SOSP '91.

[23]  Robert Grimm,et al.  Revisiting Structured Storage: A Transactional Record Store , 2000 .

[24]  B. Clifford Neuman,et al.  The Prospero File System: A Global File System Based on the Virtual System Model , 1992, Comput. Syst..

[25]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[26]  Jeffrey C. Mogul,et al.  Representing Information About Files , 1984, ICDCS.

[27]  Olivier Ridoux,et al.  A Logic File System , 2003, USENIX Annual Technical Conference, General Track.

[28]  Alexander S. Szalay,et al.  GrayWulf: Scalable Software Architecture for Data Intensive Computing , 2009 .

[29]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[30]  Steven J. DeRose,et al.  XML Path Language (XPath) , 1999 .