The widespread appeal of MapReduce is due, in part, to its simple programming model. Programmers provide only application logic while the MapReduce framework handles the logistics of data distribution and parallel task management.
We present the Continuous-MapReduce (C-MR) framework which implements a modified MapReduce processing model to continuously execute workflows of MapReduce jobs on unbounded data streams. In keeping with the philosophy of MapReduce, C-MR abstracts away the complexities of parallel stream processing and workflow scheduling while providing the simple and familiar MapReduce programming interface with the addition of stream window semantics.
Modifying the MapReduce processing model allowed us to: (1) maintain correct stream order and execution semantics in the presence of parallel and asynchronous processing elements; (2) implement an operator scheduler framework to facilitate latency-oriented scheduling policies for executing complex workflows of MapReduce jobs; and (3) leverage much of the work that has gone into the last decade of stream processing research including: pipelined parallelism, incremental processing for both Map and Reduce operations, minimizing redundant computations, sharing of sub-queries, and adaptive query processing.
C-MR was developed for use on a multiprocessor architecture, where we demonstrate its effectiveness at supporting high-performance stream processing even in the presence of load spikes and external workloads.
[1]
David Maier,et al.
Exploiting Punctuation Semantics in Continuous Data Streams
,
2003,
IEEE Trans. Knowl. Data Eng..
[2]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[3]
David Maier,et al.
No pane, no gain: efficient evaluation of sliding-window aggregates over data streams
,
2005,
SGMD.
[4]
Ying Xing,et al.
The Design of the Borealis Stream Processing Engine
,
2005,
CIDR.
[5]
Kun-Lung Wu,et al.
DEDUCE: at the intersection of MapReduce and stream processing
,
2010,
EDBT '10.
[6]
Joseph M. Hellerstein,et al.
MapReduce Online
,
2010,
NSDI.
[7]
Ken Yocum,et al.
In-situ MapReduce for Log Processing
,
2011,
USENIX Annual Technical Conference.
[8]
Justin Talbot,et al.
Phoenix++: modular MapReduce for shared-memory systems
,
2011,
MapReduce '11.