Efficient Online Index Maintenance for SSD-based Information Retrieval Systems

Solid state disks (SSDs) can potentially eliminate the I/O bottleneck for many conventional applications. However, they have a very unique characteristic of erase-before-write, which probably makes existing index maintenance methods inapplicable to SSDs. In this paper, we propose Hybrid Merge, a new online index maintenance strategy for information retrieval systems, which applies SSDs instead of hard disk drives (HDDs) to store inverted indexes. We analyze the existing indexing methods through experiments, and design a new merge-based indexing method with no random writes. We try to take the full advantage of the SSD's fast random reads to overcome the defects of existing methods. Experimental results show that the proposed method improves indexing and query performance with extremely low write traffic compare to existing approaches.

[1]  Alistair Moffat,et al.  Efficient online index construction for text databases , 2008, TODS.

[2]  Umakishore Ramachandran,et al.  FlashLite: A User-Level Library to Enhance Durability of SSD for P2P File Sharing , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[3]  Hector Garcia-Molina,et al.  Incremental updates of inverted lists for text document retrieval , 1994, SIGMOD '94.

[4]  Moon Jeung Joe,et al.  LGeDBMS: a small DBMS for embedded system with flash memory , 2006, VLDB.

[5]  Sivan Toledo,et al.  Algorithms and data structures for flash memories , 2005, CSUR.

[6]  Sang-Won Lee,et al.  In-Page Logging B-Tree for Flash Memory , 2009, DASFAA.

[7]  Sang-Won Lee,et al.  A survey of Flash Translation Layer , 2009, J. Syst. Archit..

[8]  Evangelos Eleftheriou,et al.  Write amplification analysis in flash-based solid state drives , 2009, SYSTOR '09.

[9]  Hector Garcia-Molina,et al.  Synthetic workload performance analysis of incremental updates , 1994, SIGIR '94.

[10]  Charles L. A. Clarke,et al.  Indexing time vs. query time: trade-offs in dynamic information retrieval systems , 2005, CIKM '05.

[11]  Hugh E. Williams,et al.  Efficient online index maintenance for contiguous inverted lists , 2006, Inf. Process. Manag..

[12]  Jongmoo Choi,et al.  On Improving the Reliability and Performance of the YAFFS Flash File System , 2011, IEICE Trans. Inf. Syst..

[13]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[14]  Tian Luo,et al.  CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives , 2011, FAST.

[15]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[16]  Jan O. Pedersen,et al.  Optimization for dynamic inverted index maintenance , 1989, SIGIR '90.

[17]  Charles L. A. Clarke,et al.  A Hybrid Approach to Index Maintenance in Dynamic Text Retrieval Systems , 2006, ECIR.

[18]  Sang-Won Lee,et al.  Design of flash-based DBMS: an in-page logging approach , 2007, SIGMOD '07.

[19]  Philip S. Yu,et al.  TS-Trees: A Non-Alterable Search Tree Index for Trustworthy Databases on Write-Once-Read-Many (WORM) Storage , 2007, 21st International Conference on Advanced Information Networking and Applications (AINA '07).