Iterative algorithms occur in many domains of data analysis, such as machine learning or graph analysis. With increasing interest to run those algorithms on very large data sets, we see a need for new techniques to execute iterations in a massively parallel fashion. In prior work, we have shown how to extend and use a parallel data flow system to efficiently run iterative algorithms in a shared-nothing environment. Our approach supports the incremental processing nature of many of those algorithms.
In this demonstration proposal we illustrate the process of implementing, compiling, optimizing, and executing iterative algorithms on Stratosphere using examples from graph analysis and machine learning. For the first step, we show the algorithm's code and a visualization of the produced data flow programs. The second step shows the optimizer's execution plan choices, while the last phase monitors the execution of the program, visualizing the state of the operators and additional metrics, such as per-iteration runtime and number of updates.
To show that the data flow abstraction supports easy creation of custom programming APIs, we also present programs written against a lightweight Pregel API that is layered on top of our system with a small programming effort.
[1]
Dominic Battré,et al.
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
,
2010,
SoCC '10.
[2]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[3]
Astrid Rheinländer,et al.
Opening the Black Boxes in Data Flow Optimization
,
2012,
Proc. VLDB Endow..
[4]
Aart J. C. Bik,et al.
Pregel: a system for large-scale graph processing
,
2010,
SIGMOD Conference.
[5]
Leslie G. Valiant,et al.
A bridging model for parallel computation
,
1990,
CACM.
[6]
Volker Markl,et al.
Spinning Fast Iterative Data Flows
,
2012,
Proc. VLDB Endow..
[7]
John Cieslewicz,et al.
SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions
,
2009,
Proc. VLDB Endow..