Rapport: Semantic-sensitive Namespace Management in Large-scale File Systems

Explosive growth in volume and complexity of data exacerbates the key challenge to effectively and efficiently manage data in a way that fundamentally improves the ease and efficacy of their use. Existing large-scale file systems rely on hierarchically structured namespace that leads to severe performance bottlenecks and renders it impossible to support real-time queries on multi-dimensional attributes. This paper proposes a novel semantic-sensitive scheme, called Rapport, to provide dynamic and adaptive namespace management and support complex queries. The basic idea is to build files’ namespace by utilizing their semantic correlation and exploiting dynamic evolution of attributes to support namespace management. Extensive tracedriven experiments validate the effectiveness and efficiency of our proposed schemes. To the best of our knowledge, this is the first work on semantic-sensitive namespace management for ultra-scale file systems.

[1]  Pierre Jouvelot,et al.  Semantic file systems , 1991, SOSP '91.

[2]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[3]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[4]  Amin Vahdat,et al.  Interposed request routing for scalable network storage , 2000, TOCS.

[5]  Andrew W. Leung,et al.  Copernicus: A Scalable, High-Performance Semantic File System , 2009 .

[6]  Zheng Zhang,et al.  Designing a robust namespace for distributed file services , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[7]  Jacob R. Lorch,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OSDI '02.

[8]  Hong Jiang,et al.  Scalable and Adaptive Metadata Management in Ultra Large-Scale File Systems , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[9]  Bo Yu,et al.  Bounded LSH for Similarity Search in Peer-to-Peer File Systems , 2008, 2008 37th International Conference on Parallel Processing.

[10]  P. G. Neumann,et al.  A general-purpose file system for secondary storage , 1965, Published in AFIPS '65 (Fall, part I).

[11]  Yasushi Saito,et al.  Pangaea: a symbiotic wide-area file system , 2002, EW 10.

[12]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[13]  Magnus Karlsson,et al.  Taming aggressive replication in the Pangaea wide-area file system , 2002, OPSR.

[14]  Qi Zhang,et al.  Characterization of storage workload traces from production Windows Servers , 2008, 2008 IEEE International Symposium on Workload Characterization.

[15]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[16]  Panos Kalnis,et al.  Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.

[17]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[18]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[19]  Benjamin Piwowarski,et al.  Measurement, Theory , 2022 .

[20]  Rajeev Motwani,et al.  Lower bounds on locality sensitive hashing , 2005, SCG '06.

[21]  Nikolai Joukov,et al.  A nine year study of file system and storage benchmarking , 2008, TOS.

[22]  Jason Flinn,et al.  quFiles: The right file at the right time , 2010, TOS.

[23]  Edward W. Felten,et al.  Archipelago: an Island-based file system for highly available and scalable internet services , 2000 .

[24]  Margo I. Seltzer,et al.  Passive NFS Tracing of Email and Research Workloads , 2003, FAST.

[25]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[26]  Adriana Iamnitchi,et al.  File grouping for scientific data management: lessons from experimenting with real traces , 2008, HPDC '08.

[27]  Shankar Pasupathy,et al.  Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems , 2009, FAST.

[28]  Hong Jiang,et al.  SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[29]  E. L. Miller,et al.  Magellan : A Searchable Metadata Architecture for Large-Scale File Systems Technical Report UCSC-SSRC-09-07 November 2009 , 2009 .

[30]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[31]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[32]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[33]  Lorrie Faith Cranor,et al.  Perspective: Semantic Data Management for the Home , 2009, FAST.

[34]  Erik Riedel,et al.  A Framework for Evaluating Storage System Security , 2002, FAST.

[35]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[36]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.