Tide-tree: A self-tuning indexing scheme for hybrid storage system

Main memory index is built with the assumption that the RAM is sufficiently large to hold data. Due to the volatility and high unit price of main memory, indices under secondary memory such as SSD and HDD are widely used. However, the I/O operation with main memory is still the bottleneck for query efficiency. In this paper, we propose a self-tuning indexing scheme called Tide-tree for RAM/Disk-based hybrid storage system. Tide-tree aims to overcome the obstacles main memory and disk-based indices face, and performs like the tide to achieve a double-win in space and performance, which is self-adaptive with respect to the running environment. Particularly, Tide-tree delaminates the tree structure adaptively with high efficiency based on storage sense, and applies an effective self-tuning algorithm to dynamically load various nodes into main memory. We employ memory mapping technology to solve the persistent problem of main memory index, and improves the efficiency of data synchronism and pointer translation. To further enhance the independence of Tide-tree, we employ the index head and the level address table to manage the whole index. With the index head, three efficient operations are proposed, namely index rebuild, index load and range search. We have conducted extensive experiments to compare the Tide-tree with several state-of-the-art indices, and the results have validated the high efficiency, reusability and stability of Tide-tree.

[1]  Wolfgang Lehner,et al.  A high-throughput in-memory index, durable on flash-based SSD: insights into the winning solution of the SIGMOD programming contest 2011 , 2012, SGMD.

[2]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[3]  Torsten Suel,et al.  Three-level caching for efficient query processing in large Web search engines , 2005, WWW.

[4]  James K. Mullin,et al.  A second look at bloom filters , 1983, CACM.

[5]  Viktor Leis,et al.  The adaptive radix tree: ARTful indexing for main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[6]  Bingsheng He,et al.  Tree Indexing on Flash Disks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Divyakant Agrawal,et al.  Albatross: Lightweight Elasticity in Shared Storage Databases for the Cloud using Live Data Migration , 2011, Proc. VLDB Endow..

[8]  Simonas Saltenis,et al.  FB-tree: a B+-tree for flash-based SSDs , 2011, IDEAS '11.

[9]  Roland H. C. Yap,et al.  Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores , 2012, Proc. VLDB Endow..

[10]  Wolfgang Lehner,et al.  Efficient In-Memory Indexing with Generalized Prefix Trees , 2011, BTW.

[11]  Tei-Wei Kuo,et al.  An efficient B-tree layer implementation for flash-memory storage systems , 2007, TECS.

[12]  Jennifer Widom,et al.  Database System Implementation , 2000 .

[13]  Mark Lillibridge,et al.  In-Memory Performance for Big Data , 2014, Proc. VLDB Endow..

[14]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[15]  Ramesh K. Sitaraman,et al.  Lazy-Adaptive Tree: An Optimized Index Structure for Flash Devices , 2009, Proc. VLDB Endow..

[16]  Minsuk Kahng,et al.  MMap: Fast billion-scale graph computation on a PC via memory mapping , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[17]  David J. DeWitt,et al.  A Performance Study of Alternative Object Faulting and Pointer Swizzling Strategies , 1992, VLDB.

[18]  Xingming Sun,et al.  Enabling Personalized Search over Encrypted Outsourced Data with Efficiency Improvement , 2016, IEEE Transactions on Parallel and Distributed Systems.

[19]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[20]  Xingming Sun,et al.  Toward Efficient Multi-Keyword Fuzzy Search Over Encrypted Outsourced Data With Accuracy Improvement , 2016, IEEE Transactions on Information Forensics and Security.

[21]  Bingsheng He,et al.  Tree indexing on solid state drives , 2010, Proc. VLDB Endow..

[22]  Dong-Ho Lee,et al.  An efficient index buffer management scheme for implementing a B-tree on NAND flash memory , 2010, Data Knowl. Eng..

[23]  Michael J. Carey,et al.  A Study of Index Structures for a Main Memory Database Management System , 1986, HPTS.

[24]  Zhihua Xia,et al.  A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data , 2016, IEEE Transactions on Parallel and Distributed Systems.

[25]  Harumi A. Kuno,et al.  Self-selecting, self-tuning, incrementally optimized indexes , 2010, EDBT '10.

[26]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[27]  Peiquan Jin,et al.  Optimizing B+-tree for hybrid storage systems , 2014, Distributed and Parallel Databases.

[28]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[29]  Harumi A. Kuno,et al.  Modern B-tree techniques , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[30]  Marie-Anne Neimat,et al.  Oracle TimesTen: An In-Memory Database for Enterprise Applications , 2013, IEEE Data Eng. Bull..

[31]  Peng Peng,et al.  Answering subgraph queries over massive disk resident graphs , 2014, World Wide Web.

[32]  Lei Chen,et al.  Indexing dataspaces with partitions , 2012, World Wide Web.

[33]  Kenneth A. Ross,et al.  Making B+- trees cache conscious in main memory , 2000, SIGMOD '00.

[34]  Anastasia Ailamaki,et al.  BF-Tree: Approximate Tree Indexing , 2014, Proc. VLDB Endow..

[35]  Gerhard Weikum,et al.  Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System , 2000, VLDB.

[36]  Peiquan Jin,et al.  Efficient Buffer Management for Tree Indexes on Solid State Drives , 2014, International Journal of Parallel Programming.

[37]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[38]  Suman Nath,et al.  FlashDB: Dynamic Self-tuning Database for NAND Flash , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.