Extending CometCloud to Process Dynamic Data Streams on Heterogeneous Infrastructures

Coordination of multiple concurrent data stream processing, carried out through a distributed Cloud infrastructure, is described. The coordination (control) is carried out through the use of a Reference net (a particular type of Petri net) based interpreter, implemented alongside the Comet Cloud system. One of the benefits of this approach is that the model can also be executed directly to support the coordination action. The proposed approach supports the simultaneous processing of data streams and enables dynamic scale-up of heterogeneous computational resources on demand, while meeting the particular quality of service requirements (throughput) for each data stream. We assume that the processing to be applied to each data stream is known a priori. The workflow interpreter monitors the arrival rate and throughput of each data stream, as a consequence of carrying out the execution using Comet Cloud. We demonstrate the use of the control strategy using two key actions - allocating and deal locating resources dynamically based on the number of tasks waiting to be executed (using a predefined threshold). However, a variety of other control actions can also be supported and are described in this work. Evaluation is carried out using a distributed Comet Cloud deployment - where the allocation of new resources can be based on a number of different criteria, such as: (i) differences between sites, i.e. Based on the types of resources supported (e.g. GPU vs. CPU only, FPGAs, etc), (ii) cost of execution, (iii) failure rate and likely resilience, etc.

[1]  Bertram Ludäscher,et al.  Scientific workflow design 2.0: Demonstrating streaming data collections in Kepler , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[2]  Omer F. Rana,et al.  Enforcing QoS in scientific workflow systems enacted over Cloud infrastructures , 2012, J. Comput. Syst. Sci..

[3]  Omer F. Rana,et al.  Revenue Models for Streaming Applications over Shared Clouds , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[4]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[5]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[6]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[7]  Manish Parashar,et al.  CometCloud: An Autonomic Cloud Engine , 2011, CloudCom 2011.

[8]  Omer F. Rana,et al.  Autonomic streaming pipeline for scientific workflows , 2011, Concurr. Comput. Pract. Exp..

[9]  Yogesh L. Simmhan,et al.  Exploiting application dynamism and cloud elasticity for continuous dataflows , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[10]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[11]  Daniel Moldt,et al.  An Extensible Editor and Simulation Engine for Petri Nets: Renew , 2004, ICATPN.

[12]  Rahul Singh,et al.  Data-Driven Workflows in Multi-cloud Marketplaces , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[13]  Manish Parashar,et al.  Exploring Models and Mechanisms for Exchanging Resources in a Federated Cloud , 2014, 2014 IEEE International Conference on Cloud Engineering.

[14]  Patrick Valduriez,et al.  StreamCloud: A Large Scale Data Streaming System , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[15]  Qian Zhu,et al.  Dynamic Resource Provisioning for Data Streaming Applications in a Cloud Environment , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[16]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[17]  Ian Taylor,et al.  Programming scientific and distributed workflow with Triana services: Research Articles , 2006 .

[18]  Y. Simmhan,et al.  Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[19]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[20]  Pete Beckman,et al.  LEAD Cyberinfrastructure to Track Real-Time Storms Using SPRUCE Urgent Computing , 2008 .

[21]  Omer F. Rana,et al.  End-to-End QoS on Shared Clouds for Highly Dynamic, Large-Scale Sensing Data Streams , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[22]  Rüdiger Valk,et al.  Petri Nets as Token Objects: An Introduction to Elementary Object Nets , 1998, ICATPN.

[23]  Richard Anthony,et al.  A New Architecture for Trustworthy Autonomic Systems , 2012 .

[24]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[25]  Yogesh L. Simmhan,et al.  Adaptive rate stream processing for smart grid applications on clouds , 2011, ScienceCloud '11.

[26]  David Gelernter,et al.  Generative communication in Linda , 1985, TOPL.

[27]  Omer F. Rana,et al.  Revenue Creation for Rate Adaptive Stream Management in Multi-tenancy Environments , 2013, GECON.