Fast Cartography for Data Explorers

Exploration is the act of investigating unknown regions. An analyst exploring a database cannot, by definition, compose the right query or use the appropriate data mining algorithm. However, current data management tools cannot operate without well defined instructions. Therefore, browsing an unknown database can be a very tedious process. Our project, Atlas, is an attempt to circumvent this problem. Atlas is an active DBMS front-end, designed for database exploration. It generates and ranks several data maps from a user query. A data map is a small set of database queries (less than a dozen), in which each query describes an interesting region of the database. The user can pick one and submit it for further exploration. In order to support interaction, the system should operate in quasi-real time, possibly at the cost of precision, and require as little input parameters as possible. We draft a framework to generate such data maps, and introduce several short-to long-terms research problems.

[1]  Moshé M. Zloof QBE/OBE: A Language for Office and Business Automation , 1981, Computer.

[2]  Neoklis Polyzotis,et al.  Query Recommendations for Interactive Database Exploration , 2009, SSDBM.

[3]  George Karypis,et al.  gCLUTO – An Interactive Clustering, Visualization, and Analysis System , 2004 .

[4]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[5]  Dan Suciu,et al.  SnipSuggest: Context-Aware Autocompletion for SQL , 2010, Proc. VLDB Endow..

[6]  Marti A. Hearst,et al.  Flexible Search and Navigation using Faceted Metadata , 2002 .

[7]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[8]  A. K. Jain,et al.  Data Clustering : A , 2007 .

[9]  Martin L. Kersten,et al.  Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct , 2009, Proc. VLDB Endow..

[10]  Pat Hanrahan,et al.  Polaris: a system for query, analysis and visualization of multi-dimensional relational databases , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[13]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[14]  Hans-Peter Kriegel,et al.  Visualization Techniques for Mining Large Databases: A Comparison , 1996, IEEE Trans. Knowl. Data Eng..

[15]  Shlomo Zilberstein,et al.  Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..

[16]  Martin L. Kersten,et al.  Meet Charles, big data query advisor , 2013, CIDR.

[17]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..