Minimal Cost Data Sets Storage in the Cloud

Scientists are able to deploy computation and data intensive applications using massive computation power and storage capacity of cloud computing systems. These applications can be deployed without infrastructure investment. Cloud can be used to store large application data sets. For cost-effectively storing large volume of generated data sets in clouds, development of storage strategies and bench marking approaches have done based on the pay-as-you-go model. But they are either impractical at run time or inadequately cost-effective for storage. In this paper, a novel high cost-effective and practical storage strategy is proposed to achieve a minimum cost bench mark. Here in this proposed strategy, it can automatically decide if at run time or not the storing of generated data must be done or not. Local-optimization for the tradeoff between computation and storage is the primary objective of this strategy. Secondary objective is to take into consideration the users' preference on storage. In this paper we manage both original and generated data storage. Also we use data compression for the efficient cost effective data storage in Cloud.

[1]  James Frew,et al.  Lineage retrieval for scientific data processing: a survey , 2005, CSUR.

[2]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[3]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[4]  Xiao Liu,et al.  A Highly Practical Approach toward Achieving Minimum Data Sets Storage Cost in the Cloud , 2013, IEEE Transactions on Parallel and Distributed Systems.

[5]  Shankar Pasupathy,et al.  Maximizing Efficiency by Trading Storage for Computation , 2009, HotCloud.

[6]  Divyesh Jadav,et al.  iCostale: Adaptive Cost Optimization for Storage Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[7]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[8]  Ewa Deelman,et al.  The cost of doing science on the cloud: the Montage example , 2008, HiPC 2008.

[9]  Ian J. Taylor,et al.  Workflows and e-Science: An overview of workflow system features and capabilities , 2009, Future Gener. Comput. Syst..

[10]  Xiuzhen Cheng,et al.  Channel allocation in wireless data center networks , 2011, 2011 Proceedings IEEE INFOCOM.

[11]  Adrian Burton,et al.  Publish My Data: A Composition of Services from ANDS and ARCS , 2009, 2009 Fifth IEEE International Conference on e-Science.

[12]  Yong Zhao,et al.  Chimera: a virtual data system for representing, querying, and automating data derivation , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[13]  Jinjun Chen,et al.  Temporal dependency-based checkpoint selection for dynamic verification of temporal constraints in scientific workflow systems , 2011, TSEM.

[14]  Dejan S. Milojicic,et al.  Open Cirrus TM cloud computing testbed: federated data centers for open source systems and services research , 2009, CloudCom 2009.