Policy-Based QoS Enforcement for Adaptive Big Data Distribution on the Cloud

Big Data distribution has benefited from the Cloud resources to accommodate application's QoS requirements. In this paper, we propose Big Data distribution scheme that matches the Cloud available resources to guarantee application's QoS given the continuously dynamic and varying resources of the Cloud infrastructure. We developed Two-Level QoS Policies (TLPS) for selecting clusters and nodes while satisfying the client's application QoS. We also proposed an adaptive data distribution algorithm to cope with changing QoS requirements. Experiments have been conducted to evaluate both the effectiveness and the communication overhead of our proposed distribution scheme and the results we have reported are convincing. Other experiments evaluated our TLPS algorithm against other single-based QoS data distribution algorithms and the results show that TLPS algorithm adapts to the customer QoS requirements.

[1]  Asadullah Shah,et al.  Critical Insight for MapReduce Optimization in Hadoop , 2014 .

[2]  N. Padmapriya,et al.  Survey on Big Data Processing in Geo Distributed Data Centers , 2014 .

[3]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[4]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[5]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[6]  E. V. Prasad,et al.  Reducing Real Time Service Delays Using Map reduce Frame Work , 2012 .

[7]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[8]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[9]  Gabriel Antoniu,et al.  Adaptive file management for scientific workflows on the Azure cloud , 2013, 2013 IEEE International Conference on Big Data.

[10]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[11]  Daniel M. Batista,et al.  A Survey of Large Scale Data Management Approaches in Cloud Environments , 2011, IEEE Communications Surveys & Tutorials.

[12]  Valentin Cristea,et al.  The Art of Scheduling for Big Data Science , 2015, Big Data - Algorithms, Analytics, and Applications.

[13]  Zhang,et al.  SPBD: Streamlining Big-Data Processing in Cloud Environments , 2013 .

[14]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[15]  Xian-He Sun,et al.  ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[16]  Sandeep K. Sood,et al.  Scheduling of big data applications on distributed cloud based on QoS parameters , 2014, Cluster Computing.

[17]  Xiao Liu,et al.  A data placement strategy in scientific cloud workflows , 2010, Future Gener. Comput. Syst..