Differential Dataflow

Existing computational models for processing continuously changing input data are unable to efficiently support iterative queries except in limited special cases. This makes it difficult to perform complex tasks, such as social-graph analysis on changing data at interactive timescales, which would greatly benefit those analyzing the behavior of services like Twitter. In this paper we introduce a new model called differential computation, which extends traditional incremental computation to allow arbitrarily nested iteration, and explain—with reference to a publicly available prototype system called Naiad—how differential computation can be efficiently implemented in the context of a declarative dataparallel dataflow language. The resulting system makes it easy to program previously intractable algorithms such as incrementally updated strongly connected components, and integrate them with data transformation operations to obtain practically relevant insights from real data streams.

[1]  DONALD MICHIE,et al.  “Memo” Functions and Machine Learning , 1968, Nature.

[2]  Frank Wm. Tompa,et al.  Efficiently updating materialized views , 1986, SIGMOD '86.

[3]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[4]  Robert Harper,et al.  Self-adjusting computation , 2004, Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004..

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Gavin M. Bierman,et al.  Lost in translation: formalizing proposed extensions to c# , 2007, OOPSLA.

[7]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[8]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[9]  Marc Najork,et al.  The scalable hyperlink store , 2009, HT '09.

[10]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[11]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[12]  Guy E. Blelloch,et al.  Traceable data types for self-adjusting computation , 2010, PLDI '10.

[13]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[14]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[15]  Lenin Ravindranath,et al.  Nectar: Automatic Management of Data and Computation in Datacenters , 2010, OSDI.

[16]  Yanfeng Zhang,et al.  iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, Journal of Grid Computing.

[17]  Pramod Bhatotia,et al.  Incoop: MapReduce for incremental computations , 2011, SoCC.

[18]  Limin Jia,et al.  Maintaining distributed logic programs incrementally , 2011, Comput. Lang. Syst. Struct..

[19]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[20]  Volker Markl,et al.  Spinning Fast Iterative Data Flows , 2012, Proc. VLDB Endow..

[21]  David Maier,et al.  Logic and lattices for distributed programming , 2012, SoCC '12.

[22]  Sreenivas Gollapudi,et al.  Of hammers and nails: an empirical comparison of three paradigms for processing large graphs , 2012, WSDM '12.

[23]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[24]  Sudipto Guha,et al.  REX: Recursive, Delta-Based Data-Centric Computation , 2012, Proc. VLDB Endow..

[25]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[26]  Amir Shaikhha,et al.  DBToaster: higher-order delta processing for dynamic, frequently fresh views , 2012, The VLDB Journal.

[27]  Yanfeng Zhang,et al.  PrIter: A Distributed Framework for Prioritizing Iterative Computations , 2011, IEEE Transactions on Parallel and Distributed Systems.

[28]  Camil Demetrescu,et al.  Reactive Imperative Programming with Dataflow Constraints , 2014, ACM Trans. Program. Lang. Syst..