ES2: A cloud data storage system for supporting both OLTP and OLAP

Cloud computing represents a paradigm shift driven by the increasing demand of Web based applications for elastic, scalable and efficient system architectures that can efficiently support their ever-growing data volume and large-scale data analysis. A typical data management system has to deal with real-time updates by individual users, and as well as periodical large scale analytical processing, indexing, and data extraction. While such operations may take place in the same domain, the design and development of the systems have somehow evolved independently for transactional and periodical analytical processing. Such a system-level separation has resulted in problems such as data freshness as well as serious data storage redundancy. Ideally, it would be more efficient to apply ad-hoc analytical processing on the same data directly. However, to the best of our knowledge, such an approach has not been adopted in real implementation. Intrigued by such an observation, we have designed and implemented epiC, an elastic power-aware data-itensive Cloud platform for supporting both data intensive analytical operations (ref. as OLAP) and online transactions (ref. as OLTP). In this paper, we present ES2 - the elastic data storage system of epiC, which is designed to support both functionalities within the same storage. We present the system architecture and the functions of each system component, and experimental results which demonstrate the efficiency of the system.

[1]  Beng Chin Ooi,et al.  Peer-to-Peer Computing - Principles and Applications , 2009 .

[2]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[3]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[4]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[5]  Alexandros Labrinidis,et al.  Exploring the tradeoff between performance and data freshness in database-driven Web servers , 2004, The VLDB Journal.

[6]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[7]  Beng Chin Ooi,et al.  Efficient B-tree based indexing for cloud data processing , 2010, Proc. VLDB Endow..

[8]  Yu Cao,et al.  Optimizing complex queries with multiple relation instances , 2008, SIGMOD Conference.

[9]  Gang Chen,et al.  Providing Scalable Database Services on the Cloud , 2010, WISE.

[10]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[11]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[12]  Paul Larson,et al.  Grouping and Duplicate Elimination: Benefits of Early Aggregation , 1997 .

[13]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[14]  Gang Zhou,et al.  A framework for supporting data integration using the materialized and virtual approaches , 1996, SIGMOD '96.

[15]  Hasso Plattner,et al.  A common database approach for OLTP and OLAP using an in-memory column database , 2009, SIGMOD Conference.

[16]  John Cieslewicz,et al.  SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions , 2009, Proc. VLDB Endow..

[17]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[18]  Beng Chin Ooi,et al.  Paths to stardom: calibrating the potential of a peer-based data management system , 2008, SIGMOD Conference.

[19]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[20]  Arie Segev,et al.  Currency-based updates to distributed materialized views , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[21]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[22]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[23]  Beng Chin Ooi,et al.  Towards elastic transactional cloud storage with range query support , 2010, Proc. VLDB Endow..

[24]  David J. DeWitt,et al.  Parallel database systems: the future of database processing or a passing fad? , 1990, SGMD.

[25]  Beng Chin Ooi,et al.  Indexing multi-dimensional data in a cloud system , 2010, SIGMOD Conference.