Alpine: Efficient In-Situ Data Exploration in the Presence of Updates

The ever growing data collections create the need for brief explorations of the available data to extract relevant information before decision making becomes necessary. In this context of data exploration, current data analysis solutions struggle to quickly pinpoint useful information in data collections. One major reason is that loading data in a DBMS without knowing which part of it will actually be useful is a major bottleneck. To remove this bottleneck, state-of-the art approaches perform queries in situ, thus avoiding the loading overhead. In situ query engines, however, are index-oblivious, and lack sophisticated techniques to reduce the amount of data to be accessed. Furthermore, applications constantly generate fresh data and update the existing raw data files whereas state-of-the art in situ approaches support only append-like workloads. In this demonstration, we showcase the efficiency of adaptive indexing and partitioning techniques for analytical queries in the presence of updates. We demonstrate an online partitioning and indexing tuner for in situ querying which plugs to a query engine and offers support for fast queries over raw data files. We present Alpine, our prototype implementation, which combines the tuner with a query executor incorporating in situ query techniques to provide efficient raw data access. We will visually demonstrate how Alpine incrementally and adaptively builds auxiliary data structures and indexes over raw data files and how it adapts its behavior as a side-effect of updates in the raw data files.

[1]  Martin L. Kersten,et al.  Data Vaults: A Symbiosis between Database Technology and Scientific File Repositories , 2012, SSDBM.

[2]  Anastasia Ailamaki,et al.  Fast Queries Over Heterogeneous Data Through Engine Customization , 2016, Proc. VLDB Endow..

[3]  Stanley B. Zdonik,et al.  Query Steering for Interactive Data Exploration , 2013, CIDR.

[4]  Surajit Chaudhuri,et al.  An Online Approach to Physical Design Tuning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Manos Athanassoulis,et al.  Design Tradeoffs of Data Access Methods , 2016, SIGMOD Conference.

[6]  Neoklis Polyzotis,et al.  Query Recommendations for Interactive Database Exploration , 2009, SSDBM.

[7]  David J. DeWitt,et al.  Scientific data management in the coming decade , 2005, SGMD.

[8]  Surajit Chaudhuri,et al.  Overview of Data Exploration Techniques , 2015, SIGMOD Conference.

[9]  Ryan Johnson,et al.  Here are my Data Files. Here are my Queries. Where are my Results? , 2011, CIDR.

[10]  Jorge-Arnulfo Quiané-Ruiz,et al.  Towards zero-overhead static and adaptive indexing in Hadoop , 2013, The VLDB Journal.

[11]  Abraham Silberschatz,et al.  Invisible loading: access-driven data transfer from raw files into database systems , 2013, EDBT '13.

[12]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[13]  Anastasia Ailamaki,et al.  NoDB: efficient query execution on raw data files , 2012, Commun. ACM.

[14]  Yu Cheng,et al.  Parallel in-situ data processing with speculative loading , 2014, SIGMOD Conference.

[15]  Anastasia Ailamaki,et al.  Designing Access Methods: The RUM Conjecture , 2016, EDBT.

[16]  Serge Abiteboul,et al.  Querying and Updating the File , 1993, VLDB.

[17]  Donald Kossmann,et al.  Adaptive Range Filters for Cold Data: Avoiding Trips to Siberia , 2013, Proc. VLDB Endow..

[18]  Harumi A. Kuno,et al.  Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores , 2011, Proc. VLDB Endow..

[19]  Alfons Kemper,et al.  Instant Loading for Main Memory Databases , 2013, Proc. VLDB Endow..

[20]  Cristina L. Abad,et al.  A storage-centric analysis of MapReduce workloads: File popularity, temporal locality and arrival patterns , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[21]  Anastasia Ailamaki,et al.  BF-Tree: Approximate Tree Indexing , 2014, Proc. VLDB Endow..

[22]  Martin L. Kersten,et al.  The researcher's guide to the data deluge , 2011, Proc. VLDB Endow..

[23]  Anastasia Ailamaki,et al.  Adaptive Query Processing on RAW Data , 2014, Proc. VLDB Endow..

[24]  Serge Abiteboul,et al.  COLT: continuous on-line tuning , 2006, SIGMOD Conference.