Context-Aware Prefetching at the Storage Server

In many of today's applications, access to storage constitutes the major cost of processing a user request. Data prefetching has been used to alleviate the storage access latency. Under current prefetching techniques, the storage system prefetches a batch of blocks upon detecting an access pattern. However, the high level of concurrency in today's applications typically leads to interleaved block accesses, which makes detecting an access pattern a very challenging problem. Towards this, we propose and evaluate QuickMine, a novel, lightweight and minimally intrusive method for contextaware prefetching. Under QuickMine, we capture application contexts, such as a transaction or query, and leverage them for context-aware prediction and improved prefetching effectiveness in the storage cache. We implement a prototype of our context-aware prefetching algorithm in a storage-area network (SAN) built using Network Block Device (NBD). Our prototype shows that context-aware prefetching clearly out-performs existing context-oblivious prefetching algorithms, resulting in factors of up to 2 improvements in application latency for two e-commerce workloads with repeatable access patterns, TPC-W and RUBiS.

[1]  Song Jiang,et al.  STEP: Sequentiality and Thrashing Detection Based Prefetching to Improve Performance of Networked Storage Servers , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[2]  Andrea C. Arpaci-Dusseau,et al.  Database-aware semantically-smart storage , 2005, FAST'05.

[3]  Yuanyuan Zhou,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[4]  Andrea C. Arpaci-Dusseau,et al.  Semantically-Smart Disk Systems , 2003, FAST.

[5]  Steven W. Schlosser,et al.  Database storage management with object-based storage devices , 2005, DaMoN '05.

[6]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[7]  Kun Gao,et al.  Simultaneous Pipelining in QPipe: Exploiting Work Sharing Opportunities Across Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Yuanyuan Zhou,et al.  Eviction-based Cache Placement for Storage Caches , 2003, USENIX Annual Technical Conference, General Track.

[9]  Yan Zhang,et al.  Empirical evaluation of multi-level buffer cache collaboration for storage systems , 2005, SIGMETRICS '05.

[10]  Hui Lei,et al.  An analytical approach to file prefetching , 1997 .

[11]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[12]  Alan Jay Smith,et al.  I/O reference behavior of production database workloads and the TPC benchmarks—an analysis at the logical level , 1999, TODS.

[13]  Gregory R. Ganger,et al.  Track-Aligned Extents: Matching Access Patterns to Disk Drive Characteristics , 2002, FAST.

[14]  Laura M. Haas,et al.  Loading a Cache with Query Results , 1999, VLDB.

[15]  Garth A. Gibson,et al.  Automatic I/O hint generation through speculative execution , 1999, OSDI '99.

[16]  Anna R. Karlin,et al.  A study of integrated prefetching and caching strategies , 1995, SIGMETRICS '95/PERFORMANCE '95.

[17]  Lisa Phillips,et al.  LiveJournal's Backend and memcached: Past, Present, and Future , 2004, LISA.

[18]  Alan Jay Smith,et al.  The automatic improvement of locality in storage systems , 2005, TOCS.

[19]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[20]  Christos Faloutsos,et al.  Informed prefetching of collective input/output requests , 1999, SC '99.

[21]  Andrea C. Arpaci-Dusseau,et al.  Geiger: monitoring the buffer cache in a virtual machine environment , 2006, ASPLOS XII.

[22]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.

[23]  SmithAlan Jay,et al.  I/O reference behavior of production database workloads and the TPC benchmarksan analysis at the logical level , 2001 .

[24]  Jim Griffioen,et al.  Reducing File System Latency using a Predictive Approach , 1994, USENIX Summer.

[25]  Y. Charlie Hu,et al.  Program-Counter-Based Pattern Classification in Buffer Caching , 2004, OSDI.

[26]  Aamer Sachedina,et al.  Second-tier cache management using write hints , 2005, FAST'05.

[27]  Francois Raab,et al.  TPC-C - The Standard Benchmark for Online transaction Processing (OLTP) , 1993, The Benchmark Handbook.

[28]  Kimberly Keeton,et al.  Characterizing I/O-intensive Workload Sequentiality on Modern Disk Arrays , 2001 .

[29]  Xiaoning Ding,et al.  DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch , 2007, USENIX Annual Technical Conference.

[30]  Geoffrey H. Kuenning,et al.  An Analysis of Trace Data for Predictive File Caching in Mobile Computing , 1994, USENIX Summer.

[31]  Darrell D. E. Long,et al.  Design and Implementation of a Predictive File Prefetching Algorithm , 2001, USENIX Annual Technical Conference, General Track.

[32]  Yuanyuan Zhou,et al.  Mining block correlations to improve storage performance , 2005, TOS.

[33]  Andrea C. Arpaci-Dusseau,et al.  X-RAY: a non-invasive exclusive caching mechanism for RAIDs , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[34]  Erez Zadok,et al.  Type-safe disks , 2006, OSDI '06.

[35]  John Wilkes,et al.  My Cache or Yours? Making Storage More Exclusive , 2002, USENIX Annual Technical Conference, General Track.

[36]  Scott A. Brandt,et al.  Increasing predictive accuracy by prefetching multiple program and user specific files , 2002, Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications.

[37]  Tao Yang,et al.  Neptune: Scalable Replication Management and Programming Support for Cluster-based Network Services , 2001, USITS.

[38]  Andrea C. Arpaci-Dusseau,et al.  Information and control in gray-box systems , 2001, SOSP.

[39]  John Wilkes,et al.  Traveling to Rome: QoS Specifications for Automated Storage System Management , 2001, IWQoS.

[40]  Anastasia Ailamaki,et al.  StagedDB: Designing Database Servers for Modern Hardware , 2005, IEEE Data Eng. Bull..

[41]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.