On the CALM Principle for Bulk Synchronous Parallel Computation

In the recent years a lot of emphasis has been placed on two apparently disjoined fields: data-parallel and eventually consistent distributed systems. In this paper we propose a theoretical study over an eventually consistent data-parallel computational model. The keystone is provided by the recent finding that a class of programs exists which can be computed in an eventually consistent, coordination-free way: monotonic programs. This principle is called CALM and has been proven for distributed asynchronous settings. We make the case that, using the techniques developed by Ameloot et al., CALM does not hold in general for data-parallel systems, wherein computation usually proceeds synchronously in rounds and where communication is reliable. We then show that using novel techniques subsuming the one of Ameloot et al., the satisfiability of the CALM principle is directly related with the assumptions imposed on the behavior of the system.

[1]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[2]  Leslie Lamport,et al.  Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. , 1984, TOPL.

[3]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[4]  Catriel Beeri,et al.  Sets and negation in a logic data base language (LDL1) , 1987, PODS.

[5]  Abraham Silberschatz,et al.  Distributed processing of logic programs , 1988, SIGMOD '88.

[6]  Catriel Beeri,et al.  Optimizing existential datalog queries , 1988, PODS.

[7]  Sape Mullender,et al.  Distributed systems , 1989 .

[8]  Ouri Wolfson,et al.  A new paradigm for parallel and distributed rule-processing , 1990, SIGMOD '90.

[9]  Y. Gurevich On Finite Model Theory , 1990 .

[10]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[11]  Irène Guessarian Deciding Boundedness for Uniformly Connected Datalog Programs , 1990, ICDT.

[12]  Georg Lausen,et al.  Parallelizing Datalog programs by generalized pivoting , 1991, PODS '91.

[13]  Kenneth A. Ross,et al.  Monotonic aggregation in deductive databases , 1992, J. Comput. Syst. Sci..

[14]  Harry G. Mairson,et al.  Undecidable optimization problems for database logic programs , 1993, JACM.

[15]  Ozalp Babaoglu,et al.  Consistent global states of distributed systems: fundamental concepts and mechanisms , 1993 .

[16]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[17]  Mihalis Yannakakis,et al.  On Datalog vs. Polynomial Time , 1995, J. Comput. Syst. Sci..

[18]  Jeffrey D. Ullman,et al.  A survey of deductive database systems , 1995, J. Log. Program..

[19]  Ronald Fagin,et al.  Reasoning about knowledge , 1995 .

[20]  A. Dawar FINITE MODEL THEORY (Perspectives in Mathematical Logic) , 1997 .

[21]  Michael Mikolajczak,et al.  Designing And Building Parallel Programs: Concepts And Tools For Parallel Software Engineering , 1997, IEEE Concurrency.

[22]  Serge Abiteboul,et al.  Relational transducers for electronic commerce , 1998, J. Comput. Syst. Sci..

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Oded Shmueli,et al.  How expressive is stratified aggregation? , 2005, Annals of Mathematics and Artificial Intelligence.

[25]  E. Kindler Safety and Liveness Properties: A Survey , 2007 .

[26]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[27]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[28]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[29]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[30]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[31]  Yoram Moses,et al.  Beyond Lamport's Happened-Before: On the Role of Time Bounds in Synchronous Systems , 2010, DISC.

[32]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[33]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[34]  Joseph M. Hellerstein,et al.  The declarative imperative: experiences and conjectures in distributed logic , 2010, SGMD.

[35]  Yoram Moses,et al.  On interactive knowledge with bounded communication , 2011, J. Appl. Non Class. Logics.

[36]  Dan Suciu,et al.  Parallel evaluation of conjunctive queries , 2011, PODS.

[37]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[38]  Joseph M. Hellerstein,et al.  Consistency Analysis in Bloom: a CALM and Collected Approach , 2011, CIDR.

[39]  Rares Vernica,et al.  Hyracks: A flexible and extensible foundation for data-intensive computing , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[40]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[41]  Carlo Zaniolo,et al.  Logical Foundations of Continuous Query Languages for Data Streams , 2012, Datalog.

[42]  Bertram Ludäscher,et al.  Win-move is coordination-free (sometimes) , 2012, ICDT '12.

[43]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[44]  Matthew Felice Pace,et al.  BSP vs MapReduce , 2012, ICCS.

[45]  Carlo Zaniolo,et al.  Extending the power of datalog recursion , 2012, The VLDB Journal.

[46]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[47]  Letizia Tanca,et al.  Datalog in Time and Space, Synchronously , 2013, AMW.

[48]  Frank Neven,et al.  Relational transducers for declarative networking , 2010, JACM.

[49]  Scott Shenker,et al.  Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.

[50]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[51]  Sebastian Maneth,et al.  IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014 , 2014, IEEE International Conference on Data Engineering.

[52]  Yoram Moses,et al.  Beyond Lamport's Happened-before , 2014, J. ACM.

[53]  David Maier,et al.  Blazes: Coordination analysis for distributed programs , 2013, 2014 IEEE 30th International Conference on Data Engineering.

[54]  Tom J. Ameloot Declarative Networking: Recent Theoretical Work on Coordination, Correctness, and Declarative Semantics , 2014, SGMD.

[55]  Seunghak Lee,et al.  Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.

[56]  Tim Furche,et al.  DIADEM: Thousands of Websites to a Single Database , 2014, Proc. VLDB Endow..

[57]  Letizia Tanca,et al.  On the CALM Principle for BSP Computation , 2015, AMW.

[58]  Haibo Chen,et al.  SYNC or ASYNC: time to fuse for distributed graph-parallel computation , 2015, PPoPP.

[59]  Khuzaima Daudjee,et al.  Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems , 2015, Proc. VLDB Endow..

[60]  Carlo Zaniolo,et al.  Big Data Analytics with Datalog Queries on Spark , 2016, SIGMOD Conference.

[61]  Frank Neven,et al.  Weaker Forms of Monotonicity for Declarative Networking , 2014, ACM Trans. Database Syst..