A Fine-Grained , Dynamic Load Distribution Model for Parallel Stream Processing

Our goal is to address the unique characteristics and limitations of emerging large-scale commodity clusters to leverage their potential for the parallel processing of multidimensional data streams. To this end, we describe a new distributed stream processing model that integrates data and task parallelism by partitioning workloads into selfdescribing chunks that are dynamically assigned to available computing resources. We adapt the degree and means of processing parallelism using simulation-driven heuristic search algorithms with underlying incremental bin-packing techniques to direct chunk assignments, facilitating adaptation to fluctuating resource availability and workloads characteristics. Our experimental study quantifies the potential yields of our approach under a variety of workload and computing configurations.