Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151)

The area of Principles of Data Management (PDM) has made crucial contributions to the development of formal frameworks for understanding and managing data and knowledge. This work has involved a rich cross-fertilization between PDM and other disciplines in mathematics and computer science, including logic, complexity theory, and knowledge representation. We anticipate on-going expansion of PDM research as the technology and applications involving data management continue to grow and evolve. In particular, the lifecycle of Big Data Analytics raises a wealth of challenge areas that PDM can help with. In this report we identify some of the most important research directions where the PDM community has the potential to make significant contributions. This is done from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term.

[1]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[2]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings , 2014, Proc. Priv. Enhancing Technol..

[3]  Gianmarco De Francisci Morales,et al.  SAMOA: scalable advanced massive online analysis , 2015, J. Mach. Learn. Res..

[4]  Magdalena Ortiz,et al.  Closed Predicates in Description Logics: Results on Combined Complexity , 2016, AMW.

[5]  Jean-François Baget,et al.  On rules with existential variables: Walking the decidability line , 2011, Artif. Intell..

[6]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[7]  Magdalena Ortiz,et al.  Ontology-Mediated Query Answering with Data-Tractable Description Logics , 2015, Reasoning Web.

[8]  Floris Geerts,et al.  XPath satisfiability in the presence of DTDs , 2008, JACM.

[9]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[10]  Björn Scheuermann,et al.  Bitcoin and Beyond: A Technical Survey on Decentralized Digital Currencies , 2016, IEEE Communications Surveys & Tutorials.

[11]  Christopher De Sa,et al.  Incremental Knowledge Base Construction Using DeepDive , 2015, The VLDB Journal.

[12]  Sara Cohen,et al.  Learning Tree Patterns from Example Graphs , 2015, ICDT.

[13]  Alin Deutsch,et al.  Automatic verification of data-centric business processes , 2009, ICDT '09.

[14]  Tova Milo,et al.  BP-Ex: a uniform query engine for business process execution traces , 2010, EDBT '10.

[15]  Dan Suciu,et al.  The dichotomy of probabilistic inference for unions of conjunctive queries , 2012, JACM.

[16]  Edward W. Felten,et al.  Cookies That Give You Away: The Surveillance Implications of Web Tracking , 2015, WWW.

[17]  Serge Abiteboul,et al.  Data Responsibly: Fairness, Neutrality and Transparency in Data Analysis , 2016, EDBT.

[18]  Georg Gottlob,et al.  Schema mapping discovery from data instances , 2010, JACM.

[19]  Dan Olteanu,et al.  Learning Linear Regression Models over Factorized Joins , 2016, SIGMOD Conference.

[20]  Gerhard Weikum,et al.  Data, Responsibly (Dagstuhl Seminar 16291) , 2016, Dagstuhl Reports.

[21]  Pierre Senellart,et al.  Provenance Circuits for Trees and Treelike Instances , 2015, ICALP.

[22]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[23]  Evgenij Thorstensen,et al.  Mapping Analysis in Ontology-based Data Access: Algorithms and Complexity (Extended Abstract) , 2015, Description Logics.

[24]  Serge Abiteboul,et al.  Collaborative Access Control in WebdamLog , 2015, SIGMOD Conference.

[25]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[26]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[27]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[28]  Eli Upfal,et al.  The Case for Predictive Database Systems: Opportunities and Challenges , 2011, CIDR.

[29]  Frederick Y. Wu,et al.  Business Artifact-Centric Modeling for Real-Time Performance Monitoring , 2011, BPM.

[30]  Phokion G. Kolaitis,et al.  Learning schema mappings , 2012, ICDT '12.

[31]  Benny Kimelfeld,et al.  A dichotomy in the complexity of deletion propagation with functional dependencies , 2012, PODS '12.

[32]  Moustapha Cissé,et al.  Robust Bloom Filters for Large MultiLabel Classification Tasks , 2013, NIPS.

[33]  Witold Lipski,et al.  On semantic issues connected with incomplete information databases , 1979, ACM Trans. Database Syst..

[34]  Indre Zliobaite,et al.  A survey on measuring indirect discrimination in machine learning , 2015, ArXiv.

[35]  Leopoldo E. Bertossi,et al.  Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.

[36]  Jianwen Su,et al.  Towards Formal Analysis of Artifact-Centric Business Process Models , 2007, BPM.

[37]  Jakub Závodný,et al.  Size Bounds for Factorised Representations of Query Results , 2015, TODS.

[38]  Thomas Schwentick,et al.  Inference of concise regular expressions and DTDs , 2010, TODS.

[39]  Dan Suciu,et al.  From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System , 2015, SIGMOD Conference.

[40]  Jeffrey Heer,et al.  Enterprise Data Analysis and Visualization: An Interview Study , 2012, IEEE Transactions on Visualization and Computer Graphics.

[41]  Egor V. Kostylev,et al.  Beyond Well-designed SPARQL , 2016, ICDT.

[42]  Evaggelia Pitoura,et al.  DisC diversity: result diversification based on dissimilarity and coverage , 2012, Proc. VLDB Endow..

[43]  Noah D. Goodman The principles and practice of probabilistic programming , 2013, POPL.

[44]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[45]  Richard Hull,et al.  Data Centric BPM and the Emerging Case Management Standard: A Short Survey , 2012, Business Process Management Workshops.

[46]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[47]  Dan Suciu,et al.  A query language for NC , 1994, PODS '94.

[48]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[49]  Bin Wu,et al.  Wander Join: Online Aggregation via Random Walks , 2016, SIGMOD Conference.

[50]  Wim Martens,et al.  The (Almost) Complete Guide to Tree Pattern Containment , 2015, PODS.

[51]  Eli Upfal,et al.  The VC-Dimension of SQL Queries and Selectivity Estimation through Sampling , 2011, ECML/PKDD.

[52]  Jeffrey D. Ullman,et al.  Optimizing Multiway Joins in a Map-Reduce Environment , 2011, IEEE Transactions on Knowledge and Data Engineering.

[53]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[54]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[55]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[56]  Martin Hepp,et al.  The Web of Data for E-Commerce: Schema.org and GoodRelations for Researchers and Practitioners , 2015, ICWE.

[57]  Ke Yi,et al.  Towards a Worst-Case I/O-Optimal Algorithm for Acyclic Joins , 2016, PODS.

[58]  Eva Blomqvist,et al.  Integrating Ontology Debugging and Matching into the eXtreme Design Methodology , 2015, WOP.

[59]  Phokion G. Kolaitis,et al.  The complexity of mining maximal frequent subgraphs , 2013, PODS '13.

[60]  Ting Wu,et al.  Hear the Whole Story: Towards the Diversity of Opinion in Crowdsourcing Markets , 2015, Proc. VLDB Endow..

[61]  Thomas Schwentick,et al.  The price of query rewriting in ontology-based data access , 2014, Artif. Intell..

[62]  Eyke Hüllermeier,et al.  Extreme F-measure Maximization using Sparse Probability Estimates , 2016, ICML.

[63]  Frederick Reiss,et al.  Cleaning inconsistencies in information extraction via prioritized repairs , 2014, PODS.

[64]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[65]  E. F. Codd,et al.  Understanding Relations (Installment #7) , 1974, FDT Bull. ACM SIGFIDET SIGMOD.

[66]  Diego Calvanese,et al.  Linking Data to Ontologies , 2008, J. Data Semant..

[67]  Marcelo Arenas,et al.  A framework for annotating CSV-like data , 2016, Proc. VLDB Endow..

[68]  John Langford,et al.  Logarithmic Time Online Multiclass prediction , 2015, NIPS.

[69]  Leonid Libkin,et al.  Incomplete data: what went wrong, and how to fix it , 2014, PODS.

[70]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[71]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[72]  Tova Milo,et al.  On the Complexity of Evaluating Order Queries with the Crowd , 2015, IEEE Data Eng. Bull..

[73]  Alessandro Artale,et al.  A Cookbook for Temporal Conceptual Data Modelling with Description Logics , 2012, TOCL.

[74]  Alin Deutsch,et al.  Automatic Verification of Database-Centric Systems , 2014, SIGMOD Rec..

[75]  Diego Calvanese,et al.  Foundations of data-aware process analysis: a database theory perspective , 2013, PODS.

[76]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[77]  John Danaher,et al.  The Threat of Algocracy: Reality, Resistance and Accommodation , 2016, Philosophy & Technology.

[78]  Frank Neven,et al.  SCULPT: A Schema Language for Tabular Data on the Web , 2015, WWW.

[79]  Jef Wijsen,et al.  The Data Complexity of Consistent Query Answering for Self-Join-Free Conjunctive Queries Under Primary Key Constraints , 2015, ACM Trans. Database Syst..

[80]  Thomas Lukasiewicz,et al.  Generalized Consistent Query Answering under Existential Rules , 2016, KR.

[81]  Pierre Senellart,et al.  Probabilistic XML: Models and Complexity , 2013, Advances in Probabilistic Databases for Uncertain Information Management.

[82]  L. Libkin,et al.  Research Directions for Principles of Data Management , 2018 .

[83]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[84]  Steve Kremer,et al.  Formal Models and Techniques for Analyzing Security Protocols: A Tutorial , 2014, Found. Trends Program. Lang..

[85]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[86]  Francesco Bonchi,et al.  Algorithmic Bias: From Discrimination Discovery to Fairness-aware Data Mining , 2016, KDD.

[87]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[88]  Evgeny Kharlamov,et al.  Faceted search over RDF-based knowledge graphs , 2016, J. Web Semant..

[89]  Jan Chomicki,et al.  Prioritized repairing and consistent query answering in relational databases , 2012, Annals of Mathematics and Artificial Intelligence.

[90]  Oren Etzioni,et al.  Navigating Extracted Data with Schema Discovery , 2007, WebDB.

[91]  ArenasMarcelo,et al.  Expressive Languages for Querying the Semantic Web , 2018 .

[92]  Jennifer Widom,et al.  The Beckman Report on Database Research , 2014, SGMD.

[93]  Georg Gottlob,et al.  Expressive Languages for Querying the Semantic Web , 2018, TODS.

[94]  Rafael Peñaloza,et al.  The limits of decidability in fuzzy description logics with general concept inclusions , 2015, Artif. Intell..

[95]  Jonas Lerman,et al.  Big Data and Its Exclusions , 2013 .

[96]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[97]  Thomas Schwentick,et al.  Parallel-Correctness and Transferability for Conjunctive Queries , 2014, J. ACM.

[98]  Peter J. Haas,et al.  Simulation of database-valued markov chains using SimSQL , 2013, SIGMOD '13.

[99]  Jeffrey F. Naughton,et al.  Declarative Information Extraction Using Datalog with Embedded Extraction Predicates , 2007, VLDB.

[100]  Dan Suciu,et al.  Probabilistic Databases with MarkoViews , 2012, Proc. VLDB Endow..

[101]  Balder ten Cate,et al.  Declarative Probabilistic Programming with Datalog , 2016, ICDT.

[102]  Carsten Lutz,et al.  The Combined Approach to Ontology-Based Data Access , 2011, IJCAI.

[103]  Latanya Sweeney,et al.  Discrimination in online ad delivery , 2013, CACM.

[104]  M. Arenas,et al.  SQL ' s Three-Valued Logic and Certain Answers , 2015 .

[105]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[106]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.

[107]  Szymon Klarman,et al.  ALCALC: A Context Description Logic , 2010, JELIA.

[108]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[109]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[110]  Serge Abiteboul,et al.  Comparing workflow specification languages: A matter of views , 2012, TODS.

[111]  François Goasdoué,et al.  Query-Oriented Summarization of RDF Graphs , 2015, BICOD.

[112]  Iovka Boneva,et al.  Complexity and Expressiveness of ShEx for RDF , 2015, ICDT.

[113]  Serge Abiteboul,et al.  Managing your digital life , 2015, Commun. ACM.

[114]  Mihalis Yannakakis,et al.  On the Complexity of Database Queries , 1999, J. Comput. Syst. Sci..

[115]  Jon Feldman,et al.  On distributing symmetric streaming computations , 2008, SODA '08.

[116]  Daniel Deutch,et al.  A quest for beauty and wealth (or, business processes for database researchers) , 2011, PODS.

[117]  Mihalis Yannakakis,et al.  Algorithms for Acyclic Database Schemes , 1981, VLDB.

[118]  Christopher Ré,et al.  Transducing Markov sequences , 2014, J. ACM.

[119]  Rafael Peñaloza,et al.  Context-dependent views to axioms and consequences of Semantic Web ontologies , 2012, J. Web Semant..

[120]  Anil Nigam,et al.  Business artifacts: An approach to operational specification , 2003, IBM Syst. J..

[121]  Jennifer Widom,et al.  Towards Globally Optimal Crowdsourcing Quality Management: The Uniform Worker Setting , 2016, SIGMOD Conference.

[122]  Kevin Wilkinson,et al.  Data integration flows for business intelligence , 2009, EDBT '09.

[123]  Dan Suciu,et al.  Worst-Case Optimal Algorithms for Parallel Query Processing , 2016, ICDT.

[124]  Martín Ugarte,et al.  Foundations of JSON Schema , 2016, WWW.

[125]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[126]  Roxana Geambasu,et al.  Discovering Unwarranted Associations in Data-Driven Applications with the FairTest Testing Toolkit , 2015, ArXiv.

[127]  Leonid Libkin Certain answers as objects and knowledge , 2016, Artif. Intell..

[128]  Marcelo Arenas,et al.  Foundations of Data Exchange , 2014 .

[129]  Todd L. Veldhuizen,et al.  Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm , 2012, 1210.0481.

[130]  Giorgio Orsi,et al.  Query Rewriting and Optimization for Ontological Databases , 2014, TODS.

[131]  RONALD FAGIN,et al.  Document Spanners , 2015, J. ACM.

[132]  Michael Benedikt,et al.  XPath satisfiability in the presence of DTDs , 2008, JACM.

[133]  Dániel Marx,et al.  Size Bounds and Query Plans for Relational Joins , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[134]  Ulrike Sattler,et al.  A Case for Abductive Reasoning over Ontologies , 2006, OWLED.

[135]  C. J. Date Database in depth - relational theory for practitioners , 2005 .

[136]  Christopher Ré,et al.  Materialization optimizations for feature selection workloads , 2014, SIGMOD Conference.

[137]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[138]  J. Reidenberg,et al.  Accountable Algorithms , 2016 .

[139]  Sergio Tessaris,et al.  Quelo: an Ontology-Driven Query Interface , 2011, Description Logics.

[140]  Roxana Geambasu,et al.  XRay: Enhancing the Web's Transparency with Differential Correlation , 2014, USENIX Security Symposium.

[141]  Diego Calvanese,et al.  Conjunctive query containment and answering under description logic constraints , 2008, TOCL.