Cloud Resource Scaling for Time-Bounded and Unbounded Big Data Streaming Applications

Recent advancements in technology have led to a deluge of big data streams that require real-time analysis with strict latency constraints. A major challenge, however, is determining the amount of resources required by applications processing these streams given their high volume, velocity and variety. The majority of research efforts on resource scaling in the cloud are investigated from the cloud provider's perspective with little consideration for multiple resource bottlenecks. We aim at analyzing the resource scaling problem from an application provider's point of view such that efficient scaling decisions can be made. This paper provides two contributions to the study of resource scaling for big data streaming applications in the cloud. First, we present a Layered Multi-dimensional Hidden Markov Model (LMD-HMM) for managing time-bounded streaming applications. Second, to cater to unbounded streaming applications, we propose a framework based on a Layered Multi-dimensional Hidden Semi-Markov Model (LMD-HSMM). The parameters in our models are evaluated using modified Forward and Backward algorithms. Our detailed experimental evaluation results show that LMD-HMM is very effective with respect to cloud resource prediction for bounded streaming applications running for shorter periods while the LMD-HSMM accurately predicts the resource usage for streaming applications running for longer periods.

[1]  Marc Parizeau,et al.  Training Hidden Markov Models with Multiple Observations-A Combinatorial Method , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Silvia Bonomi,et al.  Elastic Symbiotic Scaling of Operators and Resources in Stream Processing Systems , 2018, IEEE Transactions on Parallel and Distributed Systems.

[3]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[4]  Shunzheng Yu,et al.  Hidden semi-Markov models , 2010, Artif. Intell..

[5]  Rajiv Ranjan,et al.  Streaming Big Data Processing in Datacenter Clouds , 2014, IEEE Cloud Computing.

[6]  José Antonio Lozano,et al.  A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments , 2014, Journal of Grid Computing.

[7]  Zhenhuan Gong,et al.  PRESS: PRedictive Elastic ReSource Scaling for cloud systems , 2010, 2010 International Conference on Network and Service Management.

[8]  Keqiu Li,et al.  Big Data Processing in Cloud Computing Environments , 2012, 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks.

[9]  Yogesh L. Simmhan,et al.  PLAStiCC: Predictive Look-Ahead Scheduling for Continuous Dataflows on Clouds , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[10]  Fatos Xhafa,et al.  Processing and Analytics of Big Data Streams with Yahoo!S4 , 2015, 2015 IEEE 29th International Conference on Advanced Information Networking and Applications.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Claus Pahl,et al.  Containers and Clusters for Edge Cloud Architectures -- A Technology Review , 2015, 2015 3rd International Conference on Future Internet of Things and Cloud.

[13]  Nancy Samaan,et al.  Cloud Resource Scaling for Big Data Streaming Applications Using a Layered Multi-dimensional Hidden Markov Model , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[14]  Waheed Iqbal,et al.  Adaptive resource provisioning for read intensive multi-tier applications in the cloud , 2011, Future Gener. Comput. Syst..

[15]  Moustafa Ghanem,et al.  Lightweight Resource Scaling for Cloud Applications , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[16]  Chung-Horng Lung,et al.  Towards an Autonomic Auto-scaling Prediction System for Cloud Resource Provisioning , 2015, 2015 IEEE/ACM 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems.

[17]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[18]  Paul Brebner,et al.  Is your cloud elastic enough?: performance modelling the elasticity of infrastructure as a service (IaaS) cloud applications , 2012, ICPE '12.

[19]  Francisco Herrera,et al.  A Forecasting Methodology for Workload Forecasting in Cloud Systems , 2018, IEEE Transactions on Cloud Computing.

[20]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[21]  H.M.N. Dilum Bandara,et al.  Adaptive workload prediction for proactive auto scaling in PaaS systems , 2016, 2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech).

[22]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[23]  Chung-Horng Lung,et al.  Cloud Resource Auto-scaling System Based on Hidden Markov Model (HMM) , 2014, 2014 IEEE International Conference on Semantic Computing.

[24]  Le Yi Wang,et al.  VCONF: a reinforcement learning approach to virtual machines auto-configuration , 2009, ICAC '09.

[25]  Calton Pu,et al.  Enabling Elastic Stream Processing in Shared Clusters , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[26]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[27]  Aniruddha S. Gokhale,et al.  Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[28]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[29]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[30]  Qian Zhu,et al.  Dynamic Resource Provisioning for Data Streaming Applications in a Cloud Environment , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[31]  Yin Yang,et al.  DRS: Dynamic Resource Scheduling for Real-Time Analytics over Fast Streams , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[32]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[33]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[34]  Zhiliang Zhu,et al.  Dynamic Provisioning Modeling for Virtualized Multi-tier Applications in Cloud Data Center , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.