A MapReduce Construct for Yap Prolog

This work’s aim was to design and implement a high-level Prolog primitive, based on the MapReduce programming paradigm. MapReduce is a programming model made popular by Google in 2008, even though its origins are more remote. It is composed by two simple operations, map and reduce, which can easily be applied to numerous algorithms. On the other hand, Prolog is a firstorder logic predicate language with significant declarative power. This allowing the programmer to focus on the resolution strategies for a problem in preference to the execution technicalities. Prolog is also especially suited for data storage and processing; in fact, ILP deals with making inferences from that data. A MapReduce construct applied in these circumstances would be able to efficiently scale that process and thus significantly reduce execution times. Including a MapReduce programming primitive in Prolog has three major benefits: (i) to make available a high-level abstract construct which implements the MapReduce functional model maintaining the declarative nature of the programs; (ii) to give access to a previously non-existent Prolog construct which is relevant to applications in numerous fields of knowledge; (iii) to allow for parallelism, thus speeding-up the execution of programs using this construct. The latter is particularly relevant now that multicore processors have become the favourite choice to assemble machines, even those for personal use. This, along with the fact that there are increasingly larger data processing requirements in everyday life, renders a framework using multicore architectures for efficient data processing highly relevant. MapReduce for Prolog’s focus are multicore architectures, but our primitive supports hybrid environments (shared and distributed memory), implicitly and transparently. MapReduce for Prolog was implemented in the Yap system and it follows a master-slave paradigm, in which the master is responsible for dividing and assigning the work and the slaves for processing the chunks dispatched to them. This construct’s interface has various customisation levels, and our aim is that it will come to integrate the Yap Prolog system as built-in construct. Our system was successfully tested using four distinct applications common in the literature: two of these were numeric, and the other two were composed of Prolog terms. The test were made using two implementations for the same programming interface, one for a cluster of machines and another for a multicore architecture. It was determined that our construct scaled almost ideally for these datasets, both in shared and distributed memory. Four scheduling methods were also developed and assessed, and the two more efficient ones will be made available in the final version of the library. An evaluation of the effect of the chunk size variation for different datasets and scheduling methods was performed as well, in order to define standard parameters for MapReduce for Prolog.

[1]  Vítor Santos Costa,et al.  Optimising Bytecode Emulation for Prolog , 1999, PPDP.

[2]  Deyi Li,et al.  A Fuzzy Prolog Database System , 1990 .

[3]  Anthony Skjellum,et al.  Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[4]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[5]  Ricardo Rocha On Improving the Efficiency and Robustness of Table Storage Mechanisms for Tabled Evaluation , 2007, PADL.

[6]  Letizia Tanca,et al.  What you Always Wanted to Know About Datalog (And Never Dared to Ask) , 1989, IEEE Trans. Knowl. Data Eng..

[7]  Pierre Nugues An Introduction to Language Processing with Perl and Prolog: An Outline of Theories, Implementation, and Application with Special Consideration of English, French, and German , 2006, Cognitive Technologies.

[8]  Ashwin Srinivasan,et al.  Data and task parallelism in ILP using MapReduce , 2011, Machine Learning.

[9]  V. S. Costa,et al.  The YAP Prolog system , 2011, Theory and Practice of Logic Programming.

[10]  Ricardo Rocha,et al.  On a Tabling Engine That Can Exploit Or-Parallelism , 2001, ICLP.

[11]  Torsten Hoefler,et al.  Towards Efficient MapReduce Using MPI , 2009, PVM/MPI.

[12]  Ricardo Rocha,et al.  Dynamic Mixed-Strategy Evaluation of Tabled Logic Programs , 2005, ICLP.

[13]  Charles N. Fischer,et al.  Interactive, scalable, declarative program analysis: from prototype to implementation , 2007, PPDP '07.

[14]  Mats Carlsson,et al.  SICStus Prolog—The first 25 years , 2010, Theory and Practice of Logic Programming.

[15]  Steven J. Plimpton,et al.  MapReduce in MPI for Large-scale graph algorithms , 2011, Parallel Comput..

[16]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[17]  Jimeng Sun,et al.  DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[19]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..

[20]  Jan Wielemaker,et al.  Native Preemptive Threads in SWI-Prolog , 2003, ICLP.

[21]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[22]  Shantenu Jha,et al.  Programming Abstractions for Data Intensive Computing on Clouds and Grids , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[23]  John W. Lloyd,et al.  The Gödel programming language , 1994 .

[24]  Ashwin Srinivasan,et al.  ILP: A Short Look Back and a Longer Look Forward , 2003, J. Mach. Learn. Res..

[25]  Geoffrey C. Fox,et al.  Granules: A lightweight, streaming runtime for cloud computing with support, for Map-Reduce , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[26]  C. Cordell Green,et al.  Application of Theorem Proving to Problem Solving , 1969, IJCAI.

[27]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[28]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[29]  Philippe Roussel,et al.  The birth of Prolog , 1993, HOPL-II.

[30]  Mats Carlsson,et al.  Parallel execution of prolog programs: a survey , 2001, TOPL.

[31]  Tom Schrijvers,et al.  Under Consideration for Publication in Theory and Practice of Logic Programming Swi-prolog , 2022 .

[32]  Konstantinos Sagonas,et al.  Demand-Driven Indexing of Prolog Clauses , 2007, ICLP.

[33]  Ricardo Rocha,et al.  YapOr: an Or-Parallel Prolog System Based on Environment Copying , 1999, EPIA.

[34]  John W. Lloyd,et al.  Practical Advtanages of Declarative Programming , 1994, GULP-PRODE.

[35]  Hassan Aït-Kaci,et al.  Warren's Abstract Machine: A Tutorial Reconstruction , 1991 .