Cloud Resource Scaling for Big Data Streaming Applications Using a Layered Multi-dimensional Hidden Markov Model

Recent advancements in technology have led to a deluge of data that require real-time analysis with strict latency constraints. A major challenge, however, is determining the amount of resources required by big data stream processing applications in response to heterogeneous data sources, streaming events, unpredictable data volume and velocity changes. Over-provisioning of resources for peak loads can be wasteful while under-provisioning can have a huge impact on the performance of the streaming applications. The majority of research efforts on resource scaling in the cloud are investigated from the cloud provider's perspective, they focus on web applications and do not consider multiple resource bottlenecks. We aim at analyzing the resource scaling problem from a big data streaming application provider's point of view such that efficient scaling decisions can be made for future resource utilization. This paper proposes a Layered Multi-dimensional Hidden Markov Model (LMD-HMM) for facilitating the management of resource auto-scaling for big data streaming applications in the cloud. Our detailed experimental evaluation shows that LMD-HMM performs best with an accuracy of 98%, outperforming the single-layer hidden markov model.

[1]  Keqiu Li,et al.  Big Data Processing in Cloud Computing Environments , 2012, 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks.

[2]  Zhenhuan Gong,et al.  PRESS: PRedictive Elastic ReSource Scaling for cloud systems , 2010, 2010 International Conference on Network and Service Management.

[3]  Fatos Xhafa,et al.  Processing and Analytics of Big Data Streams with Yahoo!S4 , 2015, 2015 IEEE 29th International Conference on Advanced Information Networking and Applications.

[4]  Zhiliang Zhu,et al.  Dynamic Provisioning Modeling for Virtualized Multi-tier Applications in Cloud Data Center , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[5]  Abdul Majid Mazlina,et al.  Big Data Processing in Cloud Computing Environments , 2017 .

[6]  Rajiv Ranjan,et al.  Streaming Big Data Processing in Datacenter Clouds , 2014, IEEE Cloud Computing.

[7]  José Antonio Lozano,et al.  A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments , 2014, Journal of Grid Computing.

[8]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[9]  Waheed Iqbal,et al.  Adaptive resource provisioning for read intensive multi-tier applications in the cloud , 2011, Future Gener. Comput. Syst..

[10]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[11]  Aniruddha S. Gokhale,et al.  Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[12]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[15]  Le Yi Wang,et al.  VCONF: a reinforcement learning approach to virtual machines auto-configuration , 2009, ICAC '09.

[16]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[17]  Qian Zhu,et al.  Dynamic Resource Provisioning for Data Streaming Applications in a Cloud Environment , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[18]  Moustafa Ghanem,et al.  Lightweight Resource Scaling for Cloud Applications , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[19]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[20]  Jay Kreps,et al.  Kafka : a Distributed Messaging System for Log Processing , 2011 .

[21]  Chung-Horng Lung,et al.  Cloud Resource Auto-scaling System Based on Hidden Markov Model (HMM) , 2014, 2014 IEEE International Conference on Semantic Computing.

[22]  Marc Parizeau,et al.  Training Hidden Markov Models with Multiple Observations-A Combinatorial Method , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Kirk P. Arnett,et al.  The size of the IT job market , 2008, CACM.

[24]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[25]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.