Capturing the Laws of (Data) Nature

Model tting is at the core of many scientic and industrial applications. These models encode a wealth of domain knowledge, something a database decidedly lacks. Except for simple cases, databases could not hope to achieve a deeper understanding of the hidden relationships in the data yet. We propose to harvest the statistical models that users t to the stored data as part of their analysis and use them to advance physical data storage and approximate query answering to unprecedented levels of performance. We motivate our approach with an astronomical use case and discuss its potential.

[1]  John R. Wolberg,et al.  Data Analysis Using the Method of Least Squares: Extracting the Most Information from Experiments , 2005 .

[2]  Volker Markl,et al.  LEO - DB2's LEarning Optimizer , 2001, VLDB.

[3]  Samuel Madden,et al.  MauveDB: supporting model-based user views in database systems , 2006, SIGMOD Conference.

[4]  S. Markoff,et al.  LOFAR - low frequency array , 2006 .

[5]  Sridhar Ramaswamy,et al.  Join synopses for approximate query answering , 1999, SIGMOD '99.

[6]  Martin L. Kersten,et al.  SciBORQ: Scientific data management with Bounds On Runtime and Quality , 2011, CIDR.

[7]  Yannis E. Ioannidis,et al.  Histogram-Based Approximation of Set-Valued Query-Answers , 1999, VLDB.

[8]  Thomas Seidl,et al.  Inverse predictions on continuous models in scientific databases , 2014, SSDBM '14.

[9]  Anastasia Ailamaki,et al.  H2O: a hands-free adaptive store , 2014, SIGMOD Conference.

[10]  F. T. Haddock,et al.  An Introduction to Radio Astronomy , 1996 .

[11]  Hannes Mühleisen,et al.  Best of both worlds: relational databases and statistics , 2013, SSDBM.

[12]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[13]  Martin L. Kersten,et al.  Meet Charles, big data query advisor , 2013, CIDR.

[14]  Eli Upfal,et al.  The Case for Predictive Database Systems: Opportunities and Challenges , 2011, CIDR.

[15]  Tilo Strutz,et al.  Data Fitting and Uncertainty: A practical introduction to weighted least squares and beyond , 2010 .

[16]  Rajeev Rastogi,et al.  SPARTAN: a model-based semantic compression system for massive data tables , 2001, SIGMOD '01.

[17]  Samuel Madden,et al.  Querying continuous functions in a database system , 2008, SIGMOD Conference.

[18]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[19]  Roman Frigg,et al.  Models in physics , 2008 .