Management of Data Replication for PC Cluster-based Cloud Storage System

Storage systems are essential building blocks for cloud computing infrastructures. Although high performance storage servers are the ultimate solution for cloud storage, the implementation of inexpensive storage system remains an open issue. To address this problem, the efficient cloud storage system is implemented with inexpensive and commodity computer nodes that are organized into PC cluster based datacenter. Hadoop Distributed File System (HDFS) is an open source cloud based storage platform and designed to be deployed in low-cost hardware. PC Cluster based Cloud Storage System is implemented with HDFS by enhancing replication management scheme. Data objects are distributed and replicated in a cluster of commodity nodes located in the cloud. This system provides optimum replica number as well as weighting and balancing among the storage server nodes. The experimental results show that storage can be balanced depending on the available disk space, expected availability and failure probability of each node in PC cluster.