Adaptive Provisioning of Stream Processing Systems in the Cloud

With the advent of data-intensive applications that generate large volumes of real-time data, distributed stream processing systems (DSPS) become increasingly important in domains such as social networking and web analytics. In practice, DSPSs must handle highly variable workloads caused by unpredictable changes in stream rates. Cloud computing offers an elastic infrastructure that DSPSs can use to obtain resources on-demand, but an open problem is to decide on the correct resource allocation when deploying DSPSs in the cloud. This paper proposes an adaptive approach for provisioning virtual machines (VMs) for the use of a DSPS in the cloud. We initially perform a set of benchmarks across performance metrics such as network latency and jitter to explore the feasibility of cloud-based DSPS deployments. Based on these results, we propose an algorithm for VM provisioning for DSPSs that reacts to changes in the stream workload. Through a prototype implementation on Amazon EC2, we show that our approach can achieve low-latency stream processing when VMs are not overloaded, while adjusting resources dynamically with workload changes.

[1]  Daniel Kuhn,et al.  SQPR: Stream query planning with reuse , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[2]  Alexandru Iosup,et al.  An Early Performance Analysis of Cloud Computing Services for Scientific Computing , 2008 .

[3]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Peter R. Pietzuch,et al.  Balancing load in stream processing with the cloud , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[6]  Kun-Lung Wu,et al.  SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems , 2008, Middleware.

[7]  Kang G. Shin,et al.  Automated control of multiple virtualized resources , 2009, EuroSys '09.

[8]  T. S. Eugene Ng,et al.  The Impact of Virtualization on Network Performance of Amazon EC2 Data Center , 2010, 2010 Proceedings IEEE INFOCOM.

[9]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[10]  Hamid Pirahesh,et al.  Robust query processing through progressive optimization , 2004, SIGMOD '04.

[11]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[12]  Steven Hand,et al.  Self-adaptive and self-configured CPU resource provisioning for virtualized servers using Kalman filters , 2009, ICAC '09.

[13]  Jorge-Arnulfo Quiané-Ruiz,et al.  Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[14]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[15]  M. Prange,et al.  Scientific Computing in the Cloud , 2008, Computing in Science & Engineering.

[16]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[17]  Constantinos Evangelinos,et al.  Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere- , 2008 .

[18]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[19]  Stanley B. Zdonik,et al.  Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing , 2007, VLDB.

[20]  Theodore Johnson,et al.  The Gigascope Stream Database , 2003, IEEE Data Eng. Bull..

[21]  Cathy H. Xia,et al.  Load shedding and distributed resource control of stream processing networks , 2007, Perform. Evaluation.

[22]  Ying Xing,et al.  Dynamic load distribution in the Borealis stream processor , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Prashant J. Shenoy,et al.  Empirical evaluation of latency-sensitive application performance in the cloud , 2010, MMSys '10.