Network Awareness in Internet-Scale Stream Processing

Efficient query processing across a wide-area network requires network awareness, i.e., tracking and leveraging knowledge of network characteristics when making optimization decisions. This paper summarizes our work on network-aware query processing techniques for widely-distributed, large-scale stream-processing applications. We first discuss the operator placement problem (i.e., deciding where to execute the operators of a query plan) and present results, based on a prototype deployment on the PlanetLab network testbed, that quantify the benefits of network awareness. We then present a summary of our present focus on the operator distribution problem, which involves parallelizing the evaluation of a single operator in a networked setting.