Large-scale Data Exploration Using Explanatory Regression Functions

Analysts wishing to explore multivariate data spaces, typically issue queries involving selection operators, i.e., range or equality predicates, which define data subspaces of potential interest. T...

[1]  Peter Triantafillou,et al.  Learning Set Cardinality in Distance Nearest Neighbours , 2015, 2015 IEEE International Conference on Data Mining.

[2]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[3]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[4]  Michael J. Cafarella,et al.  Database Learning: Toward a Database that Becomes Smarter Every Time , 2017, SIGMOD Conference.

[5]  Peter Triantafillou,et al.  Efficient Scalable Accurate Regression Queries in In-DBMS Analytics , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[6]  Dan Suciu,et al.  PerfXplain: Debugging MapReduce Job Performance , 2012, Proc. VLDB Endow..

[7]  Paolo Papotti,et al.  Descriptive and prescriptive data cleaning , 2014, SIGMOD Conference.

[8]  Shrainik Jain,et al.  SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment , 2016, SIGMOD Conference.

[9]  Sanjay Krishnan,et al.  PALM: Machine Learning Explanations For Iterative Debugging , 2017, HILDA@SIGMOD.

[10]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[11]  Parag Agrawal,et al.  Interpretable and Informative Explanations of Outcomes , 2014, Proc. VLDB Endow..

[12]  Aditya G. Parameswaran,et al.  SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics , 2015, Proc. VLDB Endow..

[13]  Abdul Wasay,et al.  Data Canopy: Accelerating Exploratory Statistical Analysis , 2017, SIGMOD Conference.

[14]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[15]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[16]  Peter Triantafillou,et al.  Scalable aggregation predictive analytics , 2017, Applied Intelligence.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Shivnath Babu,et al.  Cumulon: optimizing statistical data analysis in the cloud , 2013, SIGMOD '13.

[19]  Peter Triantafillou,et al.  Explaining Aggregates for Exploratory Analytics , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[20]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[21]  J. Friedman Multivariate adaptive regression splines , 1990 .

[22]  Peter Triantafillou,et al.  Adaptive learning of aggregate analytics under dynamic workloads , 2020, Future Gener. Comput. Syst..

[23]  Surajit Chaudhuri,et al.  Overview of Data Exploration Techniques , 2015, SIGMOD Conference.

[24]  Peter Triantafillou,et al.  Aggregate Query Prediction under Dynamic Workloads , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[25]  Shwetabh Khanduja,et al.  Learning a Hierarchical Monitoring System for Detecting and Diagnosing Service Issues , 2015, KDD.

[26]  Dan Suciu,et al.  Explaining Query Answers with Explanation-Ready Databases , 2015, Proc. VLDB Endow..

[27]  Jian Li,et al.  Sensitivity analysis and explanations for robust query evaluation in probabilistic databases , 2011, SIGMOD '11.

[28]  Eugene Wu,et al.  QFix: Diagnosing Errors through Query Histories , 2016, SIGMOD Conference.

[29]  Peter J. Haas,et al.  Foresight: Recommending Visual Insights , 2017, Proc. VLDB Endow..

[30]  Dan Suciu,et al.  Causality and Explanations in Databases , 2014, Proc. VLDB Endow..

[31]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[32]  Jeffrey Heer,et al.  The Effects of Interactive Latency on Exploratory Visual Analysis , 2014, IEEE Transactions on Visualization and Computer Graphics.

[33]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[34]  Samuel Madden,et al.  MacroBase: Prioritizing Attention in Fast Data , 2016, SIGMOD Conference.

[35]  Alexandra Meliou,et al.  Data X-Ray: A Diagnostic Tool for Data Errors , 2015, SIGMOD Conference.

[36]  Peter Triantafillou,et al.  ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning , 2020, ArXiv.

[37]  Beng Chin Ooi,et al.  Continuous sampling for online aggregation over multiple queries , 2010, SIGMOD Conference.

[38]  Surajit Chaudhuri,et al.  Effective use of block-level sampling in statistics estimation , 2004, SIGMOD '04.

[39]  Surajit Chaudhuri,et al.  Optimized stratified sampling for approximate query processing , 2007, TODS.

[40]  Daniel Deutch,et al.  Provenance for aggregate queries , 2011, PODS.

[41]  Samuel Madden,et al.  Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..

[42]  Boris Glavic,et al.  Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances , 2019, SIGMOD Conference.

[43]  Jean Claude Utazirubanda,et al.  Variable selection with group LASSO approach: Application to Cox regression with frailty model , 2019, Commun. Stat. Simul. Comput..

[44]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[45]  Michael Stonebraker,et al.  SubZero: A fine-grained lineage system for scientific databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).