Modelling Machine Learning Algorithms on Relational Data with Datalog

The standard process of data science tasks is to prepare features inside a database, export them as a denormalized data frame and then apply machine learning algorithms. This process is not optimal for two reasons. First, it requires denormalization of the database that can convert a small data problem into a big data problem. The second shortcoming is that it assumes that the machine learning algorithm is disentangled from the relational model of the problem. That seems to be a serious limitation since the relational model contains very valuable domain expertise. In this paper we explore the use of convex optimization and specifically linear programming, for modelling machine learning algorithms on relational data in an integrated way with data processing operators. We are using SolverBlox, a framework that accepts as an input Datalog code and feeds it into a linear programming solver. We demonstrate the expression of common machine learning algorithms and present use case scenarios where combining data processing with modelling of optimization problems inside a database offers significant advantages.

[1]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[2]  Bin Cui,et al.  MLog: Towards Declarative In-Database Machine Learning , 2017, Proc. VLDB Endow..

[3]  Luc De Raedt,et al.  MiningZinc: A declarative framework for constraint-based mining , 2017, Artif. Intell..

[4]  Parisa Kordjamshidi,et al.  Saul: Towards Declarative Learning Based Programming , 2015, IJCAI.

[5]  Michael N. Gubanov,et al.  Scalable Linear Algebra on a Relational Database System , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[6]  Kristian Kersting,et al.  Relational linear programming , 2017, Artif. Intell..

[7]  David Maier,et al.  Computing with Logic: Logic Programming with Prolog , 1988 .

[8]  Toni Mancini,et al.  Declarative constraint modelling and specification-level reasoning , 2003 .

[9]  Sameer Singh,et al.  WOLFE: An NLP-friendly Declarative Machine Learning Stack , 2015, HLT-NAACL.

[10]  Shirish Tatikonda,et al.  SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[11]  Kristian Kersting,et al.  RELOOP: A Python-Embedded Declarative Language for Relational Optimization , 2016, AAAI Workshop: Declarative Learning Based Programming.

[12]  Dan Roth,et al.  Learning Based Java for Rapid Development of NLP Systems , 2010, LREC.

[13]  Diego Klabjan,et al.  SolverBlox: algebraic modeling in datalog , 2018, Declarative Logic Programming.

[14]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[15]  Jeffrey D. Ullman,et al.  A survey of deductive database systems , 1995, J. Log. Program..