Implementation of Ceph Storage with Big Data for Performance Comparison

High Available share storage becomes one of the important resource information to expand our system especially for Big Data implementation system. To consider the world demand of reduce high risk data corrupt and improve the reading and writing storage performance, through our research we mainly apply Ceph storage with Big Data Performance testing in order to solve the best reading and write speed performance and data backup. This system is started from Hadoop operations. The data is stored in the Hadoop Distributed File System (HDFS) and copied to Alluxio MEM space. The data through Map Reduce processing (Mapping – Sorting – Filtering – Reducing) got the result and the output will be stored in to Alluxio MEM space. For the first experimental, we use S3 API and Rados Gateway of Ceph components as a bridge between Alluxio and Object Storage Daemon (OSDs). The second experimental is the same like first environment, but the output of Map Reduce will be directly connect to Object Storage Daemon using Ceph File System (CephFS). The data is more safety in the Ceph than in the Alluxio MEM only, because OSDs can back up the data with object storage levels. We also can use S3 browser (GUI) to maintenance the OSD’s data, e.g.: grant access, keep folder, create user account, move data location etc. The last one, we use Inkscope to monitor all system, if there is any problem the system will respond the error or giving warning alerts to the user.

[1]  Farag Azzedin Towards a scalable HDFS architecture , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[2]  Chengzhang Peng,et al.  Building a Cloud Storage Service System , 2011 .

[3]  Jian Zhang,et al.  COSBench: cloud object storage benchmark , 2013, ICPE '13.