Overview of Data Exploration Techniques

Data exploration is about efficiently extracting knowledge from data even if we do not know exactly what we are looking for. In this tutorial, we survey recent developments in the emerging area of database systems tailored for data exploration. We discuss new ideas on how to store and access data as well as new ideas on how to interact with a data system to enable users and applications to quickly figure out which data parts are of interest. In addition, we discuss how to exploit lessons-learned from past research, the new challenges data exploration crafts, emerging applications and future research directions.

[1]  Pat Hanrahan,et al.  Polaris: a system for query, analysis and visualization of multi-dimensional relational databases , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[2]  Peter J. Haas,et al.  Interactive data Analysis: The Control Project , 1999, Computer.

[3]  Divesh Srivastava,et al.  On query result diversification , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[4]  Sunita Sarawagi,et al.  i3: Intelligent, Interactive Investigaton of OLAP data cubes , 2000, SIGMOD Conference.

[5]  Stanley B. Zdonik,et al.  Query Steering for Interactive Data Exploration , 2013, CIDR.

[6]  Eleni Petraki,et al.  Database cracking: fancy scan, not poor man's sort! , 2014, DaMoN '14.

[7]  Jeffrey Xu Yu,et al.  Keyword Search in Relational Databases: A Survey , 2010, IEEE Data Eng. Bull..

[8]  Roland H. C. Yap,et al.  Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores , 2012, Proc. VLDB Endow..

[9]  Stratos Idreos,et al.  dbTouch: Analytics at your Fingertips , 2013, CIDR.

[10]  H. Markram,et al.  SCOUT : Prefetching for Latent Structure Following Queries , 2012 .

[11]  Stanley B. Zdonik,et al.  Interactive data exploration using semantic windows , 2014, SIGMOD Conference.

[12]  Leilani Battle,et al.  The Case for Data Visualization Management Systems , 2014, Proc. VLDB Endow..

[13]  Harumi A. Kuno,et al.  Concurrency Control for Adaptive Indexing , 2012, Proc. VLDB Endow..

[14]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[15]  Ameet Talwalkar,et al.  Knowing when you're wrong: building fast and reliable approximate query processing systems , 2014, SIGMOD Conference.

[16]  Michael Stonebraker,et al.  Dynamic reduction of query result sets for interactive visualizaton , 2013, 2013 IEEE International Conference on Big Data.

[17]  Jorge-Arnulfo Quiané-Ruiz,et al.  Towards zero-overhead static and adaptive indexing in Hadoop , 2013, The VLDB Journal.

[18]  Abraham Silberschatz,et al.  Learning and verifying quantified boolean queries by example , 2013, PODS '13.

[19]  Martin L. Kersten,et al.  Updating a cracked database , 2007, SIGMOD '07.

[20]  Stratos Idreos Big Data Exploration , 2013 .

[21]  Jens Dittrich,et al.  Main memory adaptive indexing for multi-core systems , 2014, DaMoN '14.

[22]  Alexandros Labrinidis,et al.  AstroShelf: understanding the universe through scalable navigation of a galaxy of annotations , 2012, SIGMOD Conference.

[23]  Yu Cheng,et al.  Parallel in-situ data processing with speculative loading , 2014, SIGMOD Conference.

[24]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[25]  Angela Bonifati,et al.  Interactive Inference of Join Queries , 2014, EDBT.

[26]  Ryan Johnson,et al.  Here are my Data Files. Here are my Queries. Where are my Results? , 2011, CIDR.

[27]  H. V. Jagadish,et al.  Guided Interaction: Rethinking the Query-Result Paradigm , 2011, Proc. VLDB Endow..

[28]  Arnab Nandi,et al.  Querying Without Keyboards , 2013, CIDR.

[29]  Arnab Nandi,et al.  Gestural Query Specification , 2013, Proc. VLDB Endow..

[30]  Olga Papaemmanouil,et al.  Explore-by-example: an automatic query steering framework for interactive data exploration , 2014, SIGMOD Conference.

[31]  Martin L. Kersten,et al.  SciBORQ: Scientific data management with Bounds On Runtime and Quality , 2011, CIDR.

[32]  Srinivasan Parthasarathy,et al.  Query by output , 2009, SIGMOD Conference.

[33]  Martin L. Kersten,et al.  Meet Charles, big data query advisor , 2013, CIDR.

[34]  Christoph Koch Abstraction Without Regret in Database Systems Building: a Manifesto , 2014, IEEE Data Eng. Bull..

[35]  Abraham Silberschatz,et al.  Playful Query Specification with DataPlay , 2012, Proc. VLDB Endow..

[36]  Fotis Psallidas,et al.  S4: Top-k Spreadsheet-Style Search for Query Discovery , 2015, SIGMOD Conference.

[37]  Tiark Rompf,et al.  Errata for "Building Efficient Query Engines in a High-Level Language" (PVLDB 7(10): 853-864) , 2014, Proc. VLDB Endow..

[38]  Alekh Jindal,et al.  Towards a One Size Fits All Database Architecture , 2011, CIDR.

[39]  Rita L. Sallam,et al.  Magic Quadrant for Business Intelligence and Analytics Platforms , 2013 .

[40]  Pat Hanrahan,et al.  Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases , 2002, IEEE Trans. Vis. Comput. Graph..

[41]  Daniel A. Keim Exploring Big Data using Visual Analytics , 2014, EDBT/ICDT Workshops.

[42]  Martin L. Kersten,et al.  Scientific discovery through weighted sampling , 2013, 2013 IEEE International Conference on Big Data.

[43]  Anastasia Ailamaki,et al.  H2O: a hands-free adaptive store , 2014, SIGMOD Conference.

[44]  Harumi A. Kuno,et al.  Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores , 2011, Proc. VLDB Endow..

[45]  Aditya G. Parameswaran,et al.  SeeDB: visualizing database queries efficiently , 2013, VLDB 2013.

[46]  Sridhar Ramaswamy,et al.  The Aqua approximate query answering system , 1999, SIGMOD '99.

[47]  Stratos Idreos,et al.  Database Cracking: Towards Auto-tuning Database Kernels , 2010 .

[48]  Surajit Chaudhuri,et al.  Discovering queries based on example tuples , 2014, SIGMOD Conference.

[49]  Evaggelia Pitoura,et al.  YmalDB: exploring relational databases via result-driven recommendations , 2013, The VLDB Journal.

[50]  Martin L. Kersten,et al.  The researcher's guide to the data deluge , 2011, Proc. VLDB Endow..

[51]  Alekh Jindal,et al.  The Uncracked Pieces in Database Cracking , 2013, Proc. VLDB Endow..

[52]  Yahui Peng A system for query, analysis and visualization of a multi-dimensional relational database , 2002 .

[53]  Mohamed A. Sharaf,et al.  DivIDE: efficient diversification for interactive data exploration , 2014, SSDBM '14.

[54]  Anastasia Ailamaki,et al.  NoDB: efficient query execution on raw data files , 2012, Commun. ACM.

[55]  Li Qian,et al.  Organic Databases , 2011, DNIS.

[56]  Arnab Nandi,et al.  Combining User Interaction, Speculative Query Execution and Sampling in the DICE System , 2014, Proc. VLDB Endow..

[57]  Abraham Silberschatz,et al.  Invisible loading: access-driven data transfer from raw files into database systems , 2013, EDBT '13.

[58]  Bahar Qarabaqi,et al.  User-driven refinement of imprecise queries , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[59]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[60]  Martin L. Kersten,et al.  Self-organizing tuple reconstruction in column-stores , 2009, SIGMOD Conference.

[61]  Daniel Perry,et al.  VizDeck: self-organizing dashboards for visual analytics , 2012, SIGMOD Conference.

[62]  Ronitt Rubinfeld,et al.  Rapid Sampling for Visualizations with Ordering Guarantees , 2014, Proc. VLDB Endow..

[63]  Stratos Idreos,et al.  dbTouch in action database kernels for touch-based data exploration , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[64]  Guoliang Li,et al.  Interactive SQL query suggestion: Making databases user-friendly , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[65]  Themis Palpanas,et al.  Indexing for interactive exploration of big data series , 2014, SIGMOD Conference.

[66]  Michael Stonebraker,et al.  TIMBER: A Sophisticated Relation Browser (Invited Paper) , 1982, VLDB.

[67]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[68]  Samuel Madden,et al.  The Case for RodentStore: An Adaptive, Declarative Storage System , 2009, CIDR.

[69]  Arnab Nandi,et al.  Distributed and interactive cube exploration , 2014, 2014 IEEE 30th International Conference on Data Engineering.