Ad-hoc aggregate query processing algorithms based on bit-store for query intensive applications in cloud computing

Ad-hoc Aggregate query is extremely important for query intensive applications in cloud computing which extracts valuable summary information on massive datasets to help the decision-maker make right decisions. Current data storage schemes (row-store and column-store) cannot efficiently answer ad-hoc aggregate query on massive data sets in cloud computing. A new data storage structure (bit vector storage structure, bit-store for short) is proposed in this paper. The paper focuses on proposing ad-hoc aggregate query algorithms based on bit-store. Firstly, the storage model of bit-store including its attribute encoding schemes and bit file organization is introduced. Secondly, different aggregate operations for query processing are presented based on different encoding schemes. Thirdly, cost analysis for different aggregate operations is presented. Finally, the effectiveness and efficiency of the proposed algorithms is showed by the analytical and experimental results.

[1]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[2]  Maozhen Li,et al.  HSim: A MapReduce simulator in enabling Cloud Computing , 2013, Future Gener. Comput. Syst..

[3]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[4]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[5]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[6]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[7]  Mario Cannataro,et al.  Parallel data intensive computing in scientific and commercial applications , 2002, Parallel Comput..

[8]  Marcin Zukowski,et al.  MonetDB/X100 - A DBMS In The CPU Cache , 2005, IEEE Data Eng. Bull..

[9]  Domenico Talia,et al.  Future Generation Computer Systems a Framework for Distributed Knowledge Management: Design and Implementation , 2022 .

[10]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[11]  Mark H. Ellisman,et al.  Data-intensive e-science frontier research , 2003, CACM.

[12]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[13]  Alexander S. Szalay,et al.  Petascale computational systems , 2007, Computer.

[14]  Xi Zhang,et al.  Applying database support for large scale data driven science in distributed environments , 2003, Proceedings. First Latin American Web Congress.

[15]  Andrzej M. Goscinski,et al.  Toward dynamic and attribute based publication, discovery and selection for cloud computing , 2010, Future Gener. Comput. Syst..

[16]  Guangwen Yang,et al.  VDB-MR: MapReduce-based distributed data integration using virtual database , 2010, Future Gener. Comput. Syst..

[17]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[18]  Peter Mika,et al.  Web Semantics in the Clouds , 2008, IEEE Intelligent Systems.

[19]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[20]  Alexander S. Szalay,et al.  Petascale Computational Systems: Balanced CyberInfrastructure in a Data-Centric World , 2006 .

[21]  Robert L. Grossman,et al.  Compute and storage clouds using wide area high performance networks , 2008, Future Gener. Comput. Syst..

[22]  Rada Chirkova,et al.  Selecting and Using Views to Compute Aggregate Queries (Extended Abstract) , 2005, ICDT.

[23]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[24]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[25]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[26]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[27]  David J. DeWitt,et al.  Materialization Strategies in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[28]  Xiao Liu,et al.  A data placement strategy in scientific cloud workflows , 2010, Future Gener. Comput. Syst..

[29]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[30]  Marcin Zukowski,et al.  Hardware-Conscious DBMS Architecture for Data-Intensive Applications , 2005 .

[31]  Jianzhong Li,et al.  Bit transposition for very large scientific and statistical databases , 1986, Algorithmica.

[32]  Eero Vainikko,et al.  Adapting scientific computing problems to clouds using MapReduce , 2012, Future Gener. Comput. Syst..

[33]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[34]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[35]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[36]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.