Elastic Stream Processing for the Internet of Things

Emerging trends like Big Data and the Internet of Things pose new challenges to established data stream processing engines. Especially, with the advent of the Internet of Things, the data that has to be processed can become very large. Since companies usually aim for cost efficiency, engines need to support resource elasticity to minimize the operational cost while maintaining real-time processing capabilities. In the work at hand, we propose and realize the distributed Platform for Elastic Stream Processing (PESP). An extensive evaluation demonstrates the practical feasibility and efficiency of the system design. The evaluation shows that PESP is able to reduce cost by 20% with minimal effects on the Quality of Service in comparison to an over-provisioning baseline. Compared to an under-provisioning baseline, PESP allows a Quality of Service improvement of 72%.

[1]  Freddy Lécué,et al.  Elastic Stream Processing for Distributed Environments , 2015, IEEE Internet Computing.

[2]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[3]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[4]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[6]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[7]  Freddy Lécué,et al.  STAR-CITY: semantic traffic analytics and reasoning for CITY , 2014, IUI.

[8]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[9]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[10]  Schahram Dustdar,et al.  Elastic stream processing in the Cloud , 2013, WIREs Data Mining Knowl. Discov..

[11]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[12]  Toyotaro Suzumura,et al.  Elastic Stream Computing with Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[13]  Yu Zheng,et al.  T-Drive trajectory data sample , 2011 .

[14]  Kun-Lung Wu,et al.  Elastic scaling of data parallel operators in stream processing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Christof Fetzer,et al.  Auto-scaling techniques for elastic data stream processing , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[16]  Ying Xing,et al.  Load Management and High Availability in the Borealis Distributed Stream Processing Engine , 2006, GSN.

[17]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[18]  Kun-Lung Wu,et al.  Elastic Scaling for Data Stream Processing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[19]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.

[20]  Claudio Soriente,et al.  StreamCloud: An Elastic and Scalable Data Streaming System , 2012, IEEE Transactions on Parallel and Distributed Systems.

[21]  Thomas S. Heinze,et al.  Elastic complex event processing , 2011, MDS '11.

[22]  Raul Castro Fernandez,et al.  Integrating scale out and fault tolerance in stream processing using operator state management , 2013, SIGMOD '13.

[23]  Imrich Chlamtac,et al.  Internet of things: Vision, applications and research challenges , 2012, Ad Hoc Networks.

[24]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[25]  Vincenzo Grassi,et al.  Distributed QoS-aware scheduling in storm , 2015, DEBS.

[26]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[27]  Schahram Dustdar,et al.  Esc: Towards an Elastic Stream Computing Platform for the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.