A model to compare cloud and non-cloud storage of Big Data

When comparing Cloud and non-Cloud Storage it can be difficult to ensure that the comparison is fair. In this paper we examine the process of setting up such a comparison and the metric used. Performance comparisons on Cloud and non-Cloud systems, deployed for biomedical scientists, have been conducted to identify improvements of efficiency and performance. Prior to the experiments, network latency, file size and job failures were identified as factors which degrade performance and experiments were conducted to understand their impacts. Organizational Sustainability Modeling (OSM) is used before, during and after the experiments to ensure fair comparisons are achieved. OSM defines the actual and expected execution time, risk control rates and is used to understand key outputs related to both Cloud and non-Cloud experiments. Forty experiments on both Cloud and non-Cloud systems were undertaken with two case studies. The first case study was focused on transferring and backing up 10,000 files of 1 GB each and the second case study was focused on transferring and backing up 1000 files 10 GB each. Results showed that first, the actual and expected execution time on the Cloud was lower than on the non-Cloud system. Second, there was more than 99% consistency between the actual and expected execution time on the Cloud while no comparable consistency was found on the non-Cloud system. Third, the improvement in efficiency was higher on the Cloud than the non-Cloud. OSM is the metric used to analyze the collected data and provided synthesis and insights to the data analysis and visualization of the two case studies. Organizational sustainability modeling (OSM) compares Cloud and non-Cloud storage.We identify factors affect performance and design ways to make fair comparisons.We explain how to use OSM including its definitions, input and output.We present two case studies of Big Data storage with 40 runs to support.Results are analyzed and presented with data analysis and visualization.

[1]  G. Nolan,et al.  Computational solutions to large-scale data management and analysis , 2010, Nature Reviews Genetics.

[2]  Jan Broeckhove,et al.  IaaS reserved contract procurement optimisation with load prediction , 2015, Future Gener. Comput. Syst..

[3]  Victor I. Chang,et al.  Case Studies and Organisational Sustainability Modelling Presented by Cloud Computing Business Framework , 2011, Int. J. Web Serv. Res..

[4]  Vanish Talwar,et al.  Monalytics: online monitoring and analytics for managing large scale data centers , 2010, ICAC '10.

[5]  Victor Chang,et al.  A proposed model to analyse risk and return for Cloud adoption , 2014 .

[6]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[7]  Charles Perrow,et al.  Organizing to Reduce the Vulnerabilities of Complexity , 1999 .

[8]  Rajkumar Buyya,et al.  Workload Prediction Using ARIMA Model and Its Impact on Cloud Applications’ QoS , 2015, IEEE Transactions on Cloud Computing.

[9]  Ramesh K. Sitaraman,et al.  The Akamai network: a platform for high-performance internet applications , 2010, OPSR.

[10]  Zhenhua Wang,et al.  Workload balancing and adaptive resource management for the swift storage system on cloud , 2015, Future Gener. Comput. Syst..

[11]  Victor I. Chang,et al.  Towards a Big Data system disaster recovery in a Private Cloud , 2015, Ad Hoc Networks.

[12]  Ari Juels,et al.  HAIL: a high-availability and integrity layer for cloud storage , 2009, CCS.

[13]  Victor Chang Cloud computing for brain segmentation - a perspective from the technology and evaluations , 2014, Int. J. Big Data Intell..

[14]  Marcelo Bagnulo,et al.  Internet Traffic Engineering , 2003, QofIS Final Report.

[15]  G. Hutcheson The Multivariate Social Scientist , 1999 .

[16]  Jose M. Alcaraz Calero,et al.  Comparative analysis of architectures for monitoring cloud computing infrastructures , 2015, Future Gener. Comput. Syst..

[17]  Daniel P. Siewiorek,et al.  High-availability computer systems , 1991, Computer.

[18]  Muthu Ramachandran,et al.  Cloud Computing Adoption Framework – a security framework for business clouds , 2015 .

[19]  Victor Chang,et al.  Case Studies and Organisational Sustainability Modelling Presented by Cloud Computing Business Framework , 2011 .

[20]  Guha Dharmarajan,et al.  Relative performance of Bayesian clustering software for inferring population substructure and individual assignment at low levels of population differentiation , 2006, Conservation Genetics.

[21]  P. McCullagh Analysis of Ordinal Categorical Data , 1985 .

[22]  Alice C Lee,et al.  Handbook of Quantitative Finance and Risk Management , 2010 .

[23]  K. Mani Chandy,et al.  Approximate Methods for Analyzing Queueing Network Models of Computing Systems , 1978, CSUR.

[24]  Rashedur M. Rahman,et al.  CAPM Indexed Hybrid E-Negotiation for Resource Allocation in Grid Computing , 2013, Int. J. Grid High Perform. Comput..

[25]  K. M. Annervaz,et al.  Multi-site data distribution for disaster recovery - A planning framework , 2014, Future Gener. Comput. Syst..

[26]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[27]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[28]  Gary Hinson Information Security Management Metrics: A Definitive Guide to Effective Security Monitoring and Measurement , 2011 .

[29]  Roy D. Sleator,et al.  'Big data', Hadoop and cloud computing in genomics , 2013, J. Biomed. Informatics.