Cloud Tree: A Library to Extend Cloud Services for Trees

In this work, we propose a library that enables on a cloud the creation and management of tree data structures from a cloud client. As a proof of concept, we implement a new cloud service CloudTree. With CloudTree, users are able to organize big data into tree data structures of their choice that are physically stored in a cloud. We use caching, prefetching, and aggregation techniques in the design and implementation of CloudTree to enhance performance. We have implemented the services of Binary Search Trees (BST) and Prefix Trees as current members in CloudTree and have benchmarked their performance using the Amazon Cloud. The idea and techniques in the design and implementation of a BST and prefix tree is generic and thus can also be used for other types of trees such as B-tree, and other link-based data structures such as linked lists and graphs. Preliminary experimental results show that CloudTree is useful and efficient for various big data applications.

[1]  Pangfeng Liu,et al.  HSQL: A Highly Scalable Cloud Database for Multi-user Query Processing , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[2]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[3]  Beng Chin Ooi,et al.  Efficient B-tree based indexing for cloud data processing , 2010, Proc. VLDB Endow..

[4]  Marcin Paprzycki,et al.  Distributed Computing: Fundamentals, Simulations and Advanced Topics , 2001, Scalable Comput. Pract. Exp..

[5]  Shigang Chen,et al.  On Deletion of Outsourced Data in Cloud Computing , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[6]  Yang Yu,et al.  An Efficient Multidimension Metadata Index and Search System for Cloud Data , 2014, 2014 IEEE 6th International Conference on Cloud Computing Technology and Science.

[7]  Andrew Rau-Chaplin,et al.  A distributed tree data structure for real-time OLAP on cloud architectures , 2013, 2013 IEEE International Conference on Big Data.

[8]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[9]  Dimitrios Katsaros,et al.  A-Tree: Distributed Indexing of Multidimensional Data for Cloud Computing Environments , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[10]  Michael A. Bender,et al.  Cache-oblivious string B-trees , 2006, PODS '06.

[11]  Zbigniew J. Czech,et al.  Introduction to Parallel Computing , 2017 .

[12]  Roberto Grossi,et al.  The string B-tree: a new data structure for string search in external memory and its applications , 1999, JACM.

[13]  Jeffrey Scott Vitter,et al.  Algorithms and Data Structures for External Memory , 2008, Found. Trends Theor. Comput. Sci..

[14]  Alok Aggarwal,et al.  The I/O Complexity of Sorting and Related Problems (Extended Abstract) , 1987, ICALP.

[15]  Chen Li,et al.  Answering approximate string queries on large data sets using external memory , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[16]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.