An efficient programming model for memory-intensive recursive algorithms using parallel disks

In order to keep up with the demand for solutions to problems with ever-increasing data sets, both academia and industry have embraced commodity computer clusters with locally attached disks or SANs as an inexpensive alternative to supercomputers. With the advent of tools for parallel disks programming, such as MapReduce, STXXL and Roomy --- that allow the developer to focus on higher-level algorithms --- the programmer productivity for memory-intensive programs has increased many-fold. However, such parallel tools were primarily targeted at iterative programs. We propose a programming model for migrating recursive RAM-based legacy algorithms to parallel disks. Many memory-intensive symbolic algebra algorithms are most easily expressed as recursive algorithms. In this case, the programming challenge is multiplied, since the developer must re-structure such an algorithm with two criteria in mind: converting a naturally recursive algorithm into an iterative algorithm, while simultaneously exposing any potential data parallelism (as needed for parallel disks). This model alleviates the large effort going into the design phase of an external memory algorithm. Research in this area over the past 10 years has focused on per-problem solutions, without providing much insight into the connection between legacy algorithms and out-of-core algorithms. Our method shows how legacy algorithms employing recursion and non-streaming memory access can be more easily translated into efficient parallel disk-based algorithms. We demonstrate the ideas on a largest computation of its kind: the determinization via subset construction and minimization of very large nondeterministic finite set automata (NFA). To our knowledge, this is the largest subset construction reported in the literature. Determinization for large NFA has long been a large computational hurdle in the study of permutation classes defined by token passing networks. The programming model was used to design and implement an efficient NFA determinization algorithm that solves the next stage in analyzing token passing networks representing two stacks in series.

[1]  Gene Cooperman,et al.  Parallel disk-based computation for large, monolithic binary decision diagrams , 2010, PASCO.

[2]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[3]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[4]  Michael Aschbacher,et al.  On the maximal subgroups of the finite classical groups , 1984 .

[5]  Mike D. Atkinson,et al.  Generalized Stack Permutations , 1998, Combinatorics, Probability and Computing.

[6]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[7]  Gerald J. Sussman,et al.  Scheme: A Interpreter for Extended Lambda Calculus , 1998, High. Order Symb. Comput..

[8]  Thomas H. Cormen,et al.  FG: A Framework Generator for Hiding Latency in Parallel Programs Running on Clusters , 2004, ISCA PDCS.

[9]  Gene Cooperman,et al.  Twenty-six moves suffice for Rubik's cube , 2007, ISSAC '07.

[10]  Kwan Woo Ryu,et al.  An efficient parallel algorithm for the single function coarsest partition problem , 1993, SPAA '93.

[11]  Gene Cooperman,et al.  A comparative analysis of parallel disk-based Methods for enumerating implicit graphs , 2007, PASCO '07.

[12]  Gerald Jay Sussman,et al.  An Interpreter for Extended Lambda Calculus , 1975 .

[13]  Hiroyuki Ochi,et al.  Breadth-first manipulation of very large binary-decision diagrams , 1993, ICCAD.

[14]  Mike D. Atkinson,et al.  Permutations Generated by Token Passing in Graphs , 1997, Theor. Comput. Sci..

[15]  Roman Dementiev,et al.  Building a parallel pipelined external memory algorithm library , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[16]  Jeffrey Scott Vitter,et al.  Algorithms for parallel memory, I: Two-level memories , 2005, Algorithmica.

[17]  Richard E. Korf,et al.  Best-First Frontier Search with Delayed Duplicate Detection , 2004, AAAI.

[18]  Jeffrey D. Ullman,et al.  Map-reduce extensions and recursive queries , 2011, EDBT/ICDT '11.

[19]  Charles E. Leiserson,et al.  A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers) , 2010, SPAA '10.

[20]  Murray Elder Permutations Generated by a Stack of Depth 2 and an Infinite Stack in Series , 2006, Electron. J. Comb..

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  P. Axt Iteration of Primitive Recursion , 1965 .

[23]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[24]  Dennis M. Ritchie,et al.  The complexity of loop programs , 1967, ACM National Conference.

[25]  Gene Cooperman,et al.  Large implicit state space enumeration: overcoming memory and disk limitations , 2008 .

[26]  Jignesh M. Patel,et al.  A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[27]  Richard E. Korf Delayed Duplicate Detection: Extended Abstract , 2003, IJCAI.

[28]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.