JetStream: enabling high performance event streaming across cloud data-centers

The easily-accessible computation power offered by cloud infrastructures coupled with the revolution of Big Data are expanding the scale and speed at which data analysis is performed. In their quest for finding the Value in the 3 Vs of Big Data, applications process larger data sets, within and across clouds. Enabling fast data transfers across geographically distributed sites becomes particularly important for applications which manage continuous streams of events in real time. Scientific applications (e.g. the Ocean Observatory Initiative or the ATLAS experiment) as well as commercial ones (e.g. Microsoft's Bing and Office 365 large-scale services) operate on tens of data-centers around the globe and follow similar patterns: they aggregate monitoring data, assess the QoS or run global data mining queries based on inter site event stream processing. In this paper, we propose a set of strategies for efficient transfers of events between cloud data-centers and we introduce JetStream: a prototype implementing these strategies as a high performance batch-based streaming middleware. JetStream is able to self-adapt to the streaming conditions by modeling and monitoring a set of context parameters. It further aggregates the available bandwidth by enabling multi-route streaming across cloud sites. The prototype was validated on tens of nodes from US and Europe data-centers of the Windows Azure cloud using synthetic benchmarks and with application code from the context of the Alice experiment at CERN. The results show an increase in transfer rate of 250 times over individual event streaming. Besides, introducing an adaptive transfer strategy brings an additional 25% gain. Finally, the transfer rate can further be tripled thanks to the use of multi-route streaming.

[1]  Nesime Tatbul,et al.  RIP: run-based intra-query parallelism for scalable complex event processing , 2013, DEBS.

[2]  Kresimir Krizanovic,et al.  OCEANUS: a spatio-temporal data stream system prototype , 2012, IWGS '12.

[3]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[4]  Marília Curado,et al.  Adaptive video-aware FEC-based mechanism with unequal error protection scheme , 2013, SAC '13.

[5]  Kurt Rothermel,et al.  Distributed spectral cluster management: a method for building dynamic publish/subscribe systems , 2012, DEBS.

[6]  Reda Alhajj,et al.  Adaptive query processing in data stream management systems under limited memory resources , 2010, PIKM '10.

[7]  Bing Zhang,et al.  StorkCloud: data transfer scheduling and optimization as a service , 2013, Science Cloud '13.

[8]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[9]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[10]  Ciprian Dobre,et al.  MonALISA: An agent based, dynamic service system to monitor, control and optimize distributed systems , 2009, Comput. Phys. Commun..

[11]  Paul N. Martinaitis,et al.  Component-based stream processing "in the cloud" , 2009, CBHPC '09.

[12]  Rui Wang,et al.  Bridging Data in the Clouds: An Environment-Aware System for Geographically Distributed Data Transfers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[13]  M. Tamer Özsu,et al.  Adaptive input admission and management for parallel stream processing , 2013, DEBS.

[14]  Tim Kraska,et al.  Stormy: an elastic and highly available streaming service in the cloud , 2012, EDBT-ICDT '12.

[15]  Yoonho Park,et al.  SPC: a distributed, scalable platform for data mining , 2006, DMSSP '06.

[16]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[17]  Vijay Laxmi,et al.  A rate adaptive and multipath routing protocol to support video streaming in MANETs , 2012, ICACCI '12.

[18]  Badrish Chandramouli,et al.  Accurate latency estimation in a distributed event processing system , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[19]  Li Su,et al.  Grand challenge: MapReduce-style processing of fast sensor data , 2013, DEBS '13.

[20]  David Maier,et al.  Scientific Exploration in the Era of Ocean Observatories , 2008, Computing in Science & Engineering.

[21]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[22]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[23]  Divyakant Agrawal,et al.  Meghdoot: Content-Based Publish/Subscribe over P2P Networks , 2004, Middleware.

[24]  Toyotaro Suzumura,et al.  Elastic Stream Computing with Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[25]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[26]  Donald F. Towsley,et al.  Path Selection and Multipath Congestion Control , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[27]  Ling Liu,et al.  Quality-aware dstributed data delivery for continuous query services , 2006, SIGMOD Conference.

[28]  Thomas Plagemann,et al.  Adaptive sized windows to improve real-time health monitoring: a case study on heart attack prediction , 2010, MIR '10.

[29]  Michael Sirivianos,et al.  Inter-datacenter bulk transfers with netstitcher , 2011, SIGCOMM.

[30]  Bingsheng He,et al.  Comet: batched stream processing for data intensive distributed computing , 2010, SoCC '10.

[31]  Tevfik Kosar,et al.  Network-aware end-to-end data throughput optimization , 2011, NDM '11.

[32]  Gustavo Alonso,et al.  Flexible and scalable storage management for data-intensive stream processing , 2009, EDBT '09.

[33]  Wei Wei,et al.  Multipath live streaming via TCP: scheme, performance and benefits , 2007, CoNEXT '07.

[34]  Kurt Rothermel,et al.  Efficient content-based routing with network topology inference , 2013, DEBS.

[35]  Mohamed A. Sharaf,et al.  Tuning QoD in stream processing engines , 2010, ADC.

[36]  Kurt Rothermel,et al.  Meeting subscriber‐defined QoS constraints in publish/subscribe systems , 2011, Concurr. Comput. Pract. Exp..

[37]  Jie Li,et al.  Early observations on the performance of Windows Azure , 2010, HPDC '10.

[38]  Gade Krishna,et al.  A scalable peer-to-peer lookup protocol for Internet applications , 2012 .

[39]  Nesime Tatbul,et al.  Stream as You Go: The Case for Incremental Data Access and Processing in the Cloud , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[40]  Robert L. Grossman,et al.  Sector and Sphere: the design and implementation of a high-performance data cloud , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[41]  Fabio Claudio Ferracchiati,et al.  In the Cloud , 2011 .

[42]  Hanif D. Sherali,et al.  Multiple Description Video Multicast in Wireless Ad Hoc Networks , 2006, Mob. Networks Appl..