General-purpose Declarative Inductive Programming with Domain-Specific Background Knowledge for Data Wrangling Automation

Given one or two examples, humans are good at understanding how to solve a problem independently of its domain, because they are able to detect what the problem is and to choose the appropriate background knowledge according to the context. For instance, presented with the string "8/17/2017" to be transformed to "17th of August of 2017", humans will process this in two steps: (1) they recognise that it is a date and (2) they map the date to the 17th of August of 2017. Inductive Programming (IP) aims at learning declarative (functional or logic) programs from examples. Two key advantages of IP are the use of background knowledge and the ability to synthesise programs from a few input/output examples (as humans do). In this paper we propose to use IP as a means for automating repetitive data manipulation tasks, frequently presented during the process of {\em data wrangling} in many data manipulation problems. Here we show that with the use of general-purpose declarative (programming) languages jointly with generic IP systems and the definition of domain-specific knowledge, many specific data wrangling problems from different application domains can be automatically solved from very few examples. We also propose an integrated benchmark for data wrangling, which we share publicly for the community.

[1]  Mart iacute,et al.  Incremental and developmental perspectives for general-purpose learning systems , 2017 .

[2]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[3]  Sumit Gulwani,et al.  Learning to Learn Programs from Examples: Going Beyond Program Structure , 2017, IJCAI.

[4]  Sumit Gulwani,et al.  Transforming spreadsheet data types using examples , 2016, POPL.

[5]  Susumu Katayama An analytical inductive functional programming system that avoids unintended programs , 2012, PEPM '12.

[6]  J. Hernández-Orallo Deep Knowledge : Inductive Programming as an Answer , 2013 .

[7]  Pierre Flener,et al.  Inductive Synthesis of Recursive Logic Programs: Achievements and Prospects , 1999, J. Log. Program..

[8]  Pushmeet Kohli,et al.  Deep API Programmer: Learning to Program with APIs , 2017, ArXiv.

[9]  Sumit Gulwani,et al.  Predicting a Correct Program in Programming by Example , 2015, CAV.

[10]  Stephen Muggleton,et al.  Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited , 2013, Machine Learning.

[11]  John Wylie Lloyd,et al.  Foundations of Logic Programming , 1987, Symbolic Computation.

[12]  José Hernández-Orallo,et al.  A Strong Complete Schmema for Inductive Functional Logic Programming , 1999, ILP.

[13]  Robert Henderson Incremental Learning in Inductive Programming , 2009, AAIP.

[14]  Dennis Shasha,et al.  AJAX: an extensible data cleaning tool , 2000, SIGMOD '00.

[15]  Dominique Brodbeck,et al.  Research directions in data wrangling: Visualizations and transformations for usable and credible data , 2011, Inf. Vis..

[16]  Ute Schmid,et al.  Inductive Synthesis of Functional Programs: An Explanation Based Generalization Approach , 2006, J. Mach. Learn. Res..

[17]  Simon L. Peyton Jones,et al.  Compiling Haskell by Program Transformation: A Report from the Trenches , 1996, ESOP.

[18]  José Hernández-Orallo,et al.  Incremental Learning of Functional Logic Programs , 2001, FLOPS.

[19]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[20]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[21]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[22]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[23]  Georg Gottlob,et al.  On the complexity of some inductive logic programming problems , 1997, New Generation Computing.

[24]  Peyton Jones,et al.  Haskell 98 language and libraries : the revised report , 2003 .

[26]  Sumit Gulwani,et al.  Synthesis in the Industrial World : Inductive , Incremental , Interactive , 2016 .

[27]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[28]  Fernando Martínez-Plumed Incremental and developmental perspectives for general-purpose learning systems , 2017, Inteligencia Artif..

[29]  Sumit Gulwani,et al.  Inductive programming meets the real world , 2015, Commun. ACM.