SANE: Semantic-Aware Namespacein Ultra-Large-Scale File Systems

The explosive growth in data volume and complexity imposes great challenges for file systems. To address these challenges, an innovative namespace management scheme is in desperate need to provide both the ease and efficiency of data access. In almost all today's file systems, the namespace management is based on hierarchical directory trees. This tree-based namespace scheme is prone to severe performance bottlenecks and often fails to provide real-time response to complex data lookups. This paper proposes a Semantic-Aware Namespace scheme, called SANE, which provides dynamic and adaptive namespace management for ultra-large storage systems with billions of files. SANE introduces a new naming methodology based on the notion of semantic-aware per-file namespace, which exploits semantic correlations among files, to dynamically aggregate correlated files into small, flat but readily manageable groups to achieve fast and accurate lookups. SANE is implemented as a middleware in conventional file systems and works orthogonally with hierarchical directory trees. The semantic correlations and file groups identified in SANE can also be used to facilitate file prefetching and data de-duplication, among other system-level optimizations. Extensive trace-driven experiments on our prototype implementation validate the efficacy and efficiency of SANE.

[1]  Garth A. Gibson,et al.  Scale and Concurrency of GIGA+: File System Directories with Millions of Files , 2011, FAST.

[2]  Alexander S. Szalay,et al.  Just-in-Time Analytics on Large File Systems , 2011, IEEE Transactions on Computers.

[3]  Shankar Pasupathy,et al.  Spyglass: Fast, Scalable Metadata Search for Large-Scale Storage Systems , 2009, FAST.

[4]  Scott A. Brandt,et al.  Dynamic Metadata Management for Petabyte-Scale File Systems , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[5]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[6]  Gregory R. Ganger,et al.  A Transparently-Scalable Metadata Service for the Ursa Minor Storage System , 2010, USENIX Annual Technical Conference.

[7]  Norman C. Hutchinson,et al.  Logical vs. physical file system backup , 1999, OSDI '99.

[8]  Zheng Zhang,et al.  Designing a robust namespace for distributed file services , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[9]  Hong Jiang,et al.  Nexus: a novel weighted-graph-based prefetching algorithm for metadata servers in petabyte-scale storage systems , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[10]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[11]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[12]  Jie Ma,et al.  Adaptive and scalable metadata management to support a trillion files , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[13]  Hong Jiang,et al.  FARMER: A novel approach to file access correlation mining and evaluation reference model , 2008, HPDC '08.

[14]  Jacob R. Lorch,et al.  A five-year study of file-system metadata , 2007, TOS.

[15]  James Lau,et al.  File System Design for an NFS File Server Appliance , 1994, USENIX Winter.

[16]  Alexander S. Szalay,et al.  Data-Intensive Computing in the 21st Century , 2008, Computer.

[17]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[18]  Peter Honeyman,et al.  Exporting storage systems in a scalable manner with pNFS , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[19]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[20]  Margo I. Seltzer,et al.  Hierarchical File Systems Are Dead , 2009, HotOS.

[21]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[22]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[23]  GhemawatSanjay,et al.  The Google file system , 2003 .

[24]  P. G. Neumann,et al.  A general-purpose file system for secondary storage , 1965, Published in AFIPS '65 (Fall, part I).

[25]  Adriana Iamnitchi,et al.  File grouping for scientific data management: lessons from experimenting with real traces , 2008, HPDC '08.

[26]  Hong Jiang,et al.  SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[27]  Carlos Maltzahn,et al.  Richer file system metadata using links and attributes , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[28]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[29]  Jason Flinn,et al.  quFiles: The right file at the right time , 2010, TOS.

[30]  Nikolai Joukov,et al.  A nine year study of file system and storage benchmarking , 2008, TOS.

[31]  Margo I. Seltzer,et al.  Passive NFS Tracing of Email and Research Workloads , 2003, FAST.

[32]  Shankar Pasupathy,et al.  Measurement and Analysis of Large-Scale Network File System Workloads , 2008, USENIX Annual Technical Conference.

[33]  Carlos Maltzahn,et al.  Ceph as a Scalable Alternative to the Hadoop Distributed File System , 2010, login Usenix Mag..

[34]  Qi Zhang,et al.  Characterization of storage workload traces from production Windows Servers , 2008, 2008 IEEE International Symposium on Workload Characterization.

[35]  Erik Riedel,et al.  A Framework for Evaluating Storage System Security , 2002, FAST.

[36]  スタンフィル,クレッグ,et al.  Parallel virtual file system , 1998 .

[37]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[38]  Pierre Jouvelot,et al.  Semantic file systems , 1991, SOSP '91.