Provenance for Non-Experts

The flourish of data-intensive systems that are geared towards direct use by non-experts, such as Natural Language question answering systems and query-by-example frameworks calls for the incorporation of provenance management. Provenance is arguably even more important for such systems than for “classic” database application. This is due to the elevated level of uncertainty associated with the typical ambiguity of user specification (e.g. phrasing questions in Natural Language or through examples). Existing provenance solutions are not geared towards the non-experts, and the typical complexity and size of their instances render them ill-suited for this goal. We outline in this paper our ongoing research and preliminary results, addressing these challenges towards developing provenance solutions that serve to explain computation results to non-expert users.

[1]  Melanie Herschel,et al.  Immutably answering Why-Not questions for equivalent conjunctive queries , 2015, Ingénierie des Systèmes d Inf..

[2]  Chris Brew,et al.  TR Discover: A Natural Language Interface for Querying and Analyzing Interlinked Datasets , 2015, International Semantic Web Conference.

[3]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[4]  Daniel Deutch,et al.  Learning Queries from Examples and Their Explanations , 2016, ArXiv.

[5]  Jakub Závodný,et al.  Factorised representations of query results: size bounds and readability , 2012, ICDT '12.

[6]  M. Emms Variants of Tree Similarity in a Question Answering Task , 2006 .

[7]  编程语言 Query by Example , 2010, Encyclopedia of Database Systems.

[8]  Adriane Chapman,et al.  Why Not? , 1965, SIGMOD Conference.

[9]  Fotis Psallidas,et al.  S4: Top-k Spreadsheet-Style Search for Query Discovery , 2015, SIGMOD Conference.

[10]  Jennifer Widom,et al.  Synthesizing view definitions from data , 2010, ICDT '10.

[11]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[12]  Val Tannen,et al.  Querying data provenance , 2010, SIGMOD Conference.

[13]  Melanie Herschel,et al.  Efficient Computation of Polynomial Explanations of Why-Not Questions , 2015, CIKM.

[14]  Abraham Silberschatz,et al.  Playful Query Specification with DataPlay , 2012, Proc. VLDB Endow..

[15]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[16]  Srinivasan Parthasarathy,et al.  Query by output , 2009, SIGMOD Conference.

[17]  Tova Milo,et al.  A Natural Language Interface for Querying General and Individual Knowledge , 2015, Proc. VLDB Endow..

[18]  Daniel Deutch,et al.  Provenance for Natural Language Queries , 2017, Proc. VLDB Endow..

[19]  Melanie Herschel A Hybrid Approach to Answering Why-Not Questions on Relational Query Results , 2015, JDIQ.

[20]  Dan Suciu,et al.  The Complexity of Causality and Responsibility for Query Answers and non-Answers , 2010, Proc. VLDB Endow..

[21]  Daniel Deutch,et al.  Selective Provenance for Datalog Programs Using Top-K Queries , 2015, Proc. VLDB Endow..

[22]  Dan Suciu,et al.  Causality in Databases , 2010, IEEE Data Eng. Bull..

[23]  Christopher Ré,et al.  Approximate lineage for probabilistic databases , 2008, Proc. VLDB Endow..

[24]  Samuel Madden,et al.  Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..

[25]  Jakub Závodný,et al.  FDB: A Query Engine for Factorised Relational Databases , 2012, Proc. VLDB Endow..

[26]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[27]  Martin L. Kersten,et al.  Meet Charles, big data query advisor , 2013, CIDR.

[28]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[29]  Daniel Deutch,et al.  Approximated Summarization of Data Provenance , 2015, CIKM.

[30]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[31]  Melanie Herschel,et al.  Query-Based Why-Not Provenance with NedExplain , 2014, EDBT.

[32]  Bart Baesens,et al.  Using Rule Extraction to Improve the Comprehensibility of Predictive Models , 2006 .

[33]  Adriane Chapman,et al.  Efficient provenance storage , 2008, SIGMOD Conference.

[34]  Yogesh L. Simmhan,et al.  Karma2: Provenance Management for Data-Driven Workflows , 2008, Int. J. Web Serv. Res..

[35]  Dietmar F. Rösner,et al.  NAUDA: a cooperative natural language interface to relational databases , 1993, SIGMOD '93.

[36]  Angela Bonifati,et al.  Interactive Inference of Join Queries , 2014, EDBT.

[37]  Surajit Chaudhuri,et al.  Discovering queries based on example tuples , 2014, SIGMOD Conference.

[38]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[39]  Themis Palpanas,et al.  Exemplar Queries: Give me an Example of What You Need , 2014, Proc. VLDB Endow..

[40]  Meihui Zhang,et al.  Reverse engineering complex join queries , 2013, SIGMOD '13.

[41]  Chris Brew,et al.  Natural Language Question Answering and Analytics for Diverse and Interlinked Datasets , 2015, HLT-NAACL.

[42]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.