An Efficient SSD-based Hybrid Storage Architecture for Large-Scale Search Engines

Large-scale search engines use hard disk drives (HDD) to store the mass index data for their capacity, whose performances are limited by the relatively low I/O performance of HDD. Caching is an effective optimization, and many caching algorithms have been proposed to improve retrieval performance. Considering the high cost of memory and huge amounts of data, the limited capacity of cache in memory cannot resolve the above problem thoroughly. In this paper, we adopt a solid state disk (SSD) based storage architecture, which uses SSD as a secondary cache for memory. We analyze the I/O patterns of search engines and propose SSD-based data management policies based on the hybrid storage architecture, including data selection, data placement and data replacement. Our main goal is to improve the performance of search engines while reducing operation cost inside SSD. The experimental results demonstrate the proposed architecture improves the hit ratio by 13.31%, the performance by 41.05%, the average access time inside SSD by 43.83%, and reduces block erasure operations by 71.52%.

[1]  Jeanna Matthews,et al.  Intel® Turbo Memory: Nonvolatile disk caches in the storage hierarchy of mainstream computer systems , 2008, TOS.

[2]  Torsten Suel,et al.  Improved techniques for result caching in web search engines , 2009, WWW '09.

[3]  Hyojun Kim,et al.  BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage , 2008, FAST.

[4]  Wagner Meira,et al.  Rank-preserving two-level caching for scalable search engines , 2001, SIGIR '01.

[5]  Sang-Won Lee,et al.  A log buffer-based flash translation layer using fully-associative sector translation , 2007, TECS.

[6]  Mahesh Balakrishnan,et al.  Extending SSD Lifetimes with Disk-Based Write Caches , 2010, FAST.

[7]  Torsten Suel,et al.  Three-Level Caching for Efficient Query Processing in Large Web Search Engines , 2005, WWW '05.

[8]  Xiaodong Zhang,et al.  Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[9]  Sang Lyul Min,et al.  A space-efficient flash translation layer for CompactFlash systems , 2002, IEEE Trans. Consumer Electron..

[10]  Umakishore Ramachandran,et al.  FlashLite: A User-Level Library to Enhance Durability of SSD for P2P File Sharing , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[11]  Youngjae Kim,et al.  DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings , 2009, ASPLOS.

[12]  Kenneth A. Ross,et al.  SSD bufferpool extensions for database systems , 2010, Proc. VLDB Endow..

[13]  Sooyong Kang,et al.  LRU-WSR: integration of LRU and writes sequence reordering for flash memory , 2008, IEEE Transactions on Consumer Electronics.

[14]  Heeseung Jo,et al.  A superblock-based flash translation layer for NAND flash memory , 2006, EMSOFT '06.

[15]  Joonwon Lee,et al.  CFLRU: a replacement algorithm for flash memory , 2006, CASES '06.

[16]  Bojun Huang,et al.  Allocating inverted index into flash memory for search engines , 2011, WWW.

[17]  Fabrizio Silvestri,et al.  Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data , 2006, TOIS.

[18]  Mithuna Thottethodi,et al.  SieveStore: a highly-selective, ensemble-level disk cache for cost-performance , 2010, ISCA '10.