Model-driven development of data intensive applications over cloud resources

Abstract The proliferation of sensors over the last years has generated large amounts of raw data, forming data streams that need to be processed. In many cases, cloud resources are used for such processing, exploiting their flexibility, but these sensor streaming applications often need to support operational and control actions that have real-time and low-latency requirements that go beyond the cost effective and flexible solutions supported by existing cloud frameworks, such as Apache Kafka, Apache Spark Streaming, or Map-Reduce Streams. In this paper, we describe a model-driven and stepwise refinement methodological approach for streaming applications executed over clouds. The central role is assigned to a set of Petri Net models for specifying functional and non-functional requirements. They support model reuse, and a way to combine formal analysis, simulation, and approximate computation of minimal and maximal boundaries of non-functional requirements when the problem is either mathematically or computationally intractable. We show how our proposal can assist developers in their design and implementation decisions from a performance perspective. Our methodology allows to conduct performance analysis: The methodology is intended for all the engineering process stages, and we can (i) analyse how it can be mapped onto cloud resources, and (ii) obtain key performance indicators, including throughput or economic cost, so that developers are assisted in their development tasks and in their decision taking. In order to illustrate our approach, we make use of the pipelined wavefront array.

[1]  Souheib Baarir,et al.  The GreatSPN tool: recent enhancements , 2009, PERV.

[2]  Peiyi Tang,et al.  parallel_dp: the parallel dynamic programming design pattern as an Intel® threading building blocks algorithm template , 2013, ACMSE '13.

[3]  G. Alonso,et al.  Parallel computing patterns for Grid workflows , 2006, 2006 Workshop on Workflows in Support of Large-Scale Science.

[4]  Jens Gustedt,et al.  Out-of-Core Wavefront Computations with Reduced Synchronization , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[5]  Opher Etzion,et al.  Event Processing in Action , 2010 .

[6]  Nanjangud C. Narendra,et al.  Cloud Pricing Models: A Survey and Position Paper. , 2013, 2013 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM).

[7]  Boleslaw K. Szymanski,et al.  Finding Optimum Wavefront of Parallel Computation , 1994, Parallel Algorithms Appl..

[8]  Daniel S. Katz,et al.  Introducing distributed dynamic data‐intensive (D3) science: Understanding applications and infrastructure , 2016, Concurr. Comput. Pract. Exp..

[9]  Manuel Silva Suárez,et al.  On the Computation of Structural Synchronic Invariants in P/T Nets , 1988, European Workshop on Applications and Theory of Petri Nets.

[10]  Dan C. Marinescu,et al.  Cloud Computing: Theory and Practice , 2013 .

[11]  Falko Bause,et al.  Stochastic Petri Nets: An Introduction to the Theory , 2012, PERV.

[12]  Lee Gillam,et al.  Performance Evaluation for Cost-Efficient Public Infrastructure Cloud Use , 2014, GECON.

[13]  Fabrizio Petrini,et al.  A general predictive performance model for wavefront algorithms on clusters of SMPs , 2000, Proceedings 2000 International Conference on Parallel Processing.

[14]  James Kempf,et al.  Handling Performance Sensitive Native Cloud Applications with Distributed Cloud Computing and SLA Management , 2013, 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing.

[15]  Adolfy Hoisie,et al.  Performance Analysis of Wavefront Algorithms on Very-Large Scale Distributed Systems , 1998, Wide Area Networks and High Performance Computing.

[16]  Antonio Brogi,et al.  Modelling the Behaviour of Management Operations in Cloud-based Applications , 2015, PNSE @ Petri Nets.

[17]  Rafael Tolosana-Calasanz,et al.  Towards Petri Net-Based Economical Analysis for Streaming Applications Executed Over Cloud Infrastructures , 2014, GECON.

[18]  Patrick Valduriez,et al.  StreamCloud: A Large Scale Data Streaming System , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[19]  Peter G. Harrison,et al.  Performance modelling of communication networks and computer architectures , 1992, International computer science series.

[20]  Gerard J. Holzmann,et al.  Conquering Complexity , 2012, Springer London.

[21]  Alain Biem,et al.  IBM infosphere streams for scalable, real-time, intelligent transportation services , 2010, SIGMOD Conference.

[22]  Jonathan Schaeffer,et al.  Generating parallel programs from the wavefront design pattern , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[23]  Jelena V. Misic,et al.  Performance Analysis of Cloud Computing Centers Using M/G/m/m+r Queuing Systems , 2012, IEEE Transactions on Parallel and Distributed Systems.

[24]  Rafael Tolosana-Calasanz,et al.  On Autonomic Platform-as-a-Service: Characterisation and Conceptual Model , 2015, KES-AMSTA.

[25]  Yacine Rezgui,et al.  Cloud Supported Building Data Analytics , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[26]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[27]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[28]  Omer F. Rana,et al.  Modelling Performance & Resource Management in Kubernetes , 2016, 2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC).

[29]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[30]  Mohamed Ariff Ameedeen,et al.  A Survey of Petri Net Tools , 2015 .

[31]  Stephen A. Jarvis,et al.  On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures , 2012, Comput. J..

[32]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[33]  Omer F. Rana,et al.  Enforcing QoS in scientific workflow systems enacted over Cloud infrastructures , 2012, J. Comput. Syst. Sci..

[34]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[35]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[36]  Hsien-Hsin S. Lee,et al.  Using Mathematical Modeling in Provisioning a Heterogeneous Cloud Computing Environment , 2011, Computer.

[37]  Gerhard Fettweis,et al.  5G-Enabled Tactile Internet , 2016, IEEE Journal on Selected Areas in Communications.

[38]  Daniel Moldt,et al.  An Extensible Editor and Simulation Engine for Petri Nets: Renew , 2004, ICATPN.

[39]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[40]  Rafael Asenjo,et al.  Evaluation of the Task Programming Model in the Parallelization of Wavefront Problems , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[41]  Stephen A. Jarvis,et al.  Predictive Performance Analysis of a Parallel Pipelined Synchronous Wavefront Application for Commodity Processor Cluster Systems , 2006, 2006 IEEE International Conference on Cluster Computing.

[42]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[43]  Marco Ajmone Marsan,et al.  Modelling with Generalized Stochastic Petri Nets , 1995, PERV.

[44]  Steven M. LaValle,et al.  Optimal motion planning for multiple robots having independent goals , 1998, IEEE Trans. Robotics Autom..

[45]  José Merseguer,et al.  Transformation challenges: from software models to performance models , 2014, Software & Systems Modeling.

[46]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[47]  R.P. Moreno,et al.  Distributed implementation of discrete event control systems based on Petri Nets , 2008, 2008 IEEE International Symposium on Industrial Electronics.

[48]  Jorge-Arnulfo Quiané-Ruiz,et al.  Runtime measurements in the cloud , 2010, Proc. VLDB Endow..

[49]  Tarek S. Abdelrahman,et al.  Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors , 2001, IEEE Trans. Parallel Distributed Syst..

[50]  Rafael Tolosana-Calasanz,et al.  A Specification Language for Performance and Economical Analysis of Short Term Data Intensive Energy Management Services , 2015, GECON.

[51]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[52]  Eike Jessen,et al.  Workshop on Wide Area Networks and High Performance Computing , 1998 .

[53]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[54]  Lawrence Snyder,et al.  Pipelining Wavefront Computations: Experiences and Performance , 2000, IPDPS Workshops.

[55]  Frank Leymann,et al.  How to adapt applications for the Cloud environment , 2012, Computing.

[56]  Douglas Thain,et al.  Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions , 2010, Cluster Computing.

[57]  Omer F. Rana,et al.  Computational resource management for data‐driven applications with deadline constraints , 2017, Concurr. Comput. Pract. Exp..

[58]  Li Yi,et al.  Harnessing parallelism in multicore clusters with the all-pairs and wavefront abstractions , 2009, HPDC '09.

[59]  Michael Mikolajczak,et al.  Designing And Building Parallel Programs: Concepts And Tools For Parallel Software Engineering , 1997, IEEE Concurrency.

[60]  Omer F. Rana,et al.  Resource management for bursty streams on multi-tenancy cloud environments , 2016, Future Gener. Comput. Syst..

[61]  Carlos Becker Westphall,et al.  Cloud resource management: A survey on forecasting and profiling models , 2015, J. Netw. Comput. Appl..

[62]  Odej Kao,et al.  Elastic Stream Processing with Latency Guarantees , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[63]  Giovanni Chiola,et al.  Properties and Performance Bounds for Timed Marked Graphs , 1992 .

[64]  Leslie Lamport Who builds a house without drawing blueprints? , 2015, Commun. ACM.

[65]  Dominic Battré,et al.  Detecting bottlenecks in parallel DAG-based data flow programs , 2010, 2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers.

[66]  Geoffrey C. Fox,et al.  HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[67]  Frank Leymann,et al.  Cloud Computing Patterns: Fundamentals to Design, Build, and Manage Cloud Applications , 2014 .

[68]  Ivica Crnkovic,et al.  Component Models for Reasoning , 2013, Computer.

[69]  Omer F. Rana,et al.  Automating Performance Analysis from Taverna Workflows , 2008, CBSE.

[70]  Giulio Sandini,et al.  Wavefront/Systolic Algorithms for Implementation of Stereo Vision and Obstacle Avoidance Computations on a Very Low Power MIMD Many-Core Parallel Architecture: Applications for Mobile Systems and Wearable Visual Guidance" , 2012 .

[71]  Odej Kao,et al.  Nephele streaming: stream processing under QoS constraints at scale , 2013, Cluster Computing.

[72]  D. V. Bhaskar Rao,et al.  Wavefront Array Processor: Language, Architecture, and Applications , 1982, IEEE Transactions on Computers.

[73]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[74]  Omer F. Rana,et al.  Autonomic streaming pipeline for scientific workflows , 2011, Concurr. Comput. Pract. Exp..

[75]  Manuel Silva Suárez,et al.  Embedded Product-Form Queueing Networks and the Improvement of Performance Bounds for Petri Net Systems , 1993, Perform. Evaluation.

[76]  Yogesh L. Simmhan,et al.  Floe: A Continuous Dataflow Framework for Dynamic Cloud Applications , 2014, ArXiv.

[77]  Omer F. Rana,et al.  Feedback-Control & Queueing Theory-Based Resource Management for Streaming Applications , 2017, IEEE Transactions on Parallel and Distributed Systems.

[78]  Jörn Altmann,et al.  Cloud Computing Value Chains: Understanding Businesses and Value Creation in the Cloud , 2010, Economic Models and Algorithms for Distributed Systems.

[79]  Francesco Basile,et al.  On the Implementation of Supervised Control of Discrete Event Systems , 2007, IEEE Transactions on Control Systems Technology.

[80]  Frank Leymann,et al.  Cloud Computing Patterns , 2014, Springer Vienna.

[82]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[83]  Marina L. Gavrilova,et al.  Computational science and its applications - ICCSA 2003 : International Conference, Montreal, Canada, May 18-21, 2003 : proceedings , 2003 .