Research Directions for Principles of Data Management (Abridged)

In April 2016, a community of researchers working in the area of Principles of Data Management (PDM) joined in a workshop at the Dagstuhl Castle in Germany. The workshop was organized jointly by the Executive Committee of the ACM Symposium on Principles of Database Systems (PODS) and the Council of the International Conference on Database Theory (ICDT). The mission of the workshop was to identify and explore some of the most important research directions that have high relevance to society and to Computer Science today, and where the PDM community has the potential to make significant contributions. This article presents a summary of the report created by the workshop [4]. That report describes the family of research directions that the workshop focused on from three perspectives: potential practical relevance, results already obtained, and research questions that appear surmountable in the short and medium term. The report organizes the identified research challenges for PDM around seven core themes, namely Managing Data at Scale, Multi-model Data, Uncertain Information, Knowledge-enriched Data, Data Management and Machine Learning, Process and Data, and Ethics and Data Management. Since new challenges in PDM arise all the time, we note that this list of themes is not intended to be exclusive.

[1]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[2]  Magdalena Ortiz,et al.  Closed Predicates in Description Logics: Results on Combined Complexity , 2016, AMW.

[3]  Alin Deutsch,et al.  Automatic verification of data-centric business processes , 2009, ICDT '09.

[4]  Michael Carl Tschantz,et al.  Automated Experiments on Ad Privacy Settings , 2014, Proc. Priv. Enhancing Technol..

[5]  Gianmarco De Francisci Morales,et al.  SAMOA: scalable advanced massive online analysis , 2015, J. Mach. Learn. Res..

[6]  Serge Abiteboul,et al.  Managing your digital life , 2015, Commun. ACM.

[7]  Martin Hepp,et al.  The Web of Data for E-Commerce: Schema.org and GoodRelations for Researchers and Practitioners , 2015, ICWE.

[8]  Ke Yi,et al.  Towards a Worst-Case I/O-Optimal Algorithm for Acyclic Joins , 2016, PODS.

[9]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[10]  E. F. Codd,et al.  Understanding Relations (Installment #7) , 1974, FDT Bull. ACM SIGFIDET SIGMOD.

[11]  Jan Chomicki,et al.  Prioritized repairing and consistent query answering in relational databases , 2012, Annals of Mathematics and Artificial Intelligence.

[12]  Oren Etzioni,et al.  Navigating Extracted Data with Schema Discovery , 2007, WebDB.

[13]  Marcelo Arenas,et al.  Foundations of Data Exchange , 2014 .

[14]  Georg Gottlob,et al.  Efficient Algorithms for Processing XPath Queries , 2002, VLDB.

[15]  Georg Gottlob,et al.  Expressive Languages for Querying the Semantic Web , 2018, TODS.

[16]  Jonas Lerman,et al.  Big Data and Its Exclusions , 2013 .

[17]  Christopher De Sa,et al.  Incremental Knowledge Base Construction Using DeepDive , 2015, The VLDB Journal.

[18]  Eyke Hüllermeier,et al.  Extreme F-measure Maximization using Sparse Probability Estimates , 2016, ICML.

[19]  Dan Suciu,et al.  Probabilistic Databases with MarkoViews , 2012, Proc. VLDB Endow..

[20]  Mihalis Yannakakis,et al.  Algorithms for Acyclic Database Schemes , 1981, VLDB.

[21]  Todd L. Veldhuizen,et al.  Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm , 2012, 1210.0481.

[22]  Serge Abiteboul,et al.  Comparing workflow specification languages: A matter of views , 2012, TODS.

[23]  Latanya Sweeney,et al.  Discrimination in online ad delivery , 2013, CACM.

[24]  François Goasdoué,et al.  Query-Oriented Summarization of RDF Graphs , 2015, BICOD.

[25]  M. Arenas,et al.  SQL ' s Three-Valued Logic and Certain Answers , 2015 .

[26]  Jean-François Baget,et al.  On rules with existential variables: Walking the decidability line , 2011, Artif. Intell..

[27]  Jon Feldman,et al.  On distributing symmetric streaming computations , 2008, SODA '08.

[28]  Daniel Deutch,et al.  A quest for beauty and wealth (or, business processes for database researchers) , 2011, PODS.

[29]  Eli Upfal,et al.  The VC-Dimension of SQL Queries and Selectivity Estimation through Sampling , 2011, ECML/PKDD.

[30]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[31]  Pierre Senellart,et al.  Provenance Circuits for Trees and Treelike Instances , 2015, ICALP.

[32]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[33]  Serge Abiteboul,et al.  Collaborative Access Control in WebdamLog , 2015, SIGMOD Conference.

[34]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[35]  Thomas Schwentick,et al.  Inference of concise regular expressions and DTDs , 2010, TODS.

[36]  Richard Hull,et al.  Data Centric BPM and the Emerging Case Management Standard: A Short Survey , 2012, Business Process Management Workshops.

[37]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[38]  Thomas Schwentick,et al.  Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151) , 2017, Dagstuhl Manifestos.

[39]  Leonid Libkin,et al.  Incomplete data: what went wrong, and how to fix it , 2014, PODS.

[40]  Dan Suciu,et al.  Tractability in probabilistic databases , 2011, ICDT '11.

[41]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[42]  Michael Benedikt,et al.  XPath satisfiability in the presence of DTDs , 2008, JACM.

[43]  Jeffrey D. Ullman,et al.  Optimizing Multiway Joins in a Map-Reduce Environment , 2011, IEEE Transactions on Knowledge and Data Engineering.

[44]  Dániel Marx,et al.  Size Bounds and Query Plans for Relational Joins , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[45]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[46]  Jeffrey Heer,et al.  Enterprise Data Analysis and Visualization: An Interview Study , 2012, IEEE Transactions on Visualization and Computer Graphics.

[47]  Egor V. Kostylev,et al.  Beyond Well-designed SPARQL , 2016, ICDT.

[48]  E. F. Codd,et al.  Understanding relations , 1973, SGMD.

[49]  Evaggelia Pitoura,et al.  DisC diversity: result diversification based on dissimilarity and coverage , 2012, Proc. VLDB Endow..

[50]  Anil Nigam,et al.  Business artifacts: An approach to operational specification , 2003, IBM Syst. J..

[51]  Kevin Wilkinson,et al.  Data integration flows for business intelligence , 2009, EDBT '09.

[52]  Dan Suciu,et al.  Worst-Case Optimal Algorithms for Parallel Query Processing , 2016, ICDT.

[53]  Leopoldo E. Bertossi,et al.  Database Repairing and Consistent Query Answering , 2011, Database Repairing and Consistent Query Answering.

[54]  Jianwen Su,et al.  Towards Formal Analysis of Artifact-Centric Business Process Models , 2007, BPM.

[55]  C. J. Date Database in depth - relational theory for practitioners , 2005 .

[56]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[57]  Diego Calvanese,et al.  Conjunctive query containment and answering under description logic constraints , 2008, TOCL.

[58]  Dan Suciu,et al.  A query language for NC , 1994, PODS '94.

[59]  Alessandro Artale,et al.  A Cookbook for Temporal Conceptual Data Modelling with Description Logics , 2012, TOCL.

[60]  Alin Deutsch,et al.  Automatic Verification of Database-Centric Systems , 2014, SIGMOD Rec..

[61]  Diego Calvanese,et al.  Foundations of data-aware process analysis: a database theory perspective , 2013, PODS.

[62]  Thomas Lukasiewicz,et al.  Generalized Consistent Query Answering under Existential Rules , 2016, KR.

[63]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[64]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[65]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[66]  Bin Wu,et al.  Wander Join: Online Aggregation via Random Walks , 2016, SIGMOD Conference.

[67]  Wim Martens,et al.  The (Almost) Complete Guide to Tree Pattern Containment , 2015, PODS.

[68]  Serge Abiteboul,et al.  Data Responsibly: Fairness, Neutrality and Transparency in Data Analysis , 2016, EDBT.

[69]  Georg Gottlob,et al.  Schema mapping discovery from data instances , 2010, JACM.

[70]  Eli Upfal,et al.  The Case for Predictive Database Systems: Opportunities and Challenges , 2011, CIDR.

[71]  Frederick Y. Wu,et al.  Business Artifact-Centric Modeling for Real-Time Performance Monitoring , 2011, BPM.

[72]  Phokion G. Kolaitis,et al.  Learning schema mappings , 2012, ICDT '12.

[73]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[74]  Moustapha Cissé,et al.  Robust Bloom Filters for Large MultiLabel Classification Tasks , 2013, NIPS.

[75]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[76]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[77]  Jianwen Su,et al.  Universal Artifacts: A New Approach to Business Process Management (BPM) Systems , 2016, TMIS.

[78]  Juliana Freire,et al.  Provenance and scientific workflows: challenges and opportunities , 2008, SIGMOD Conference.

[79]  Evgeny Kharlamov,et al.  Faceted search over RDF-based knowledge graphs , 2016, J. Web Semant..

[80]  Dan Olteanu,et al.  Learning Linear Regression Models over Factorized Joins , 2016, SIGMOD Conference.

[81]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[82]  Diego Calvanese,et al.  Tractable Reasoning and Efficient Query Answering in Description Logics: The DL-Lite Family , 2007, Journal of Automated Reasoning.