Big Data Processing using Apache Hadoop in Cloud System

The ever growing technology has resulted in the need for storing and processing excessively large amounts of data on cloud. The current volume of data is enormous and is expected to replicate over 650 times by the year 2014, out of which, 85% would be unstructured. This is known as the ‘Big Data’ problem. The techniques of Hadoop, an efficient resource scheduling method and a probabilistic redundant scheduling, are presented for the system to efficiently organize "free" computer storage resources existing within enterprises to provide low-cost high-quality storage services. The proposed methods and system provide valuable reference for the implementation of cloud storage system. The proposed method includes a Linux based cloud.

[1]  Robert L. Grossman,et al.  Compute and storage clouds using wide area high performance networks , 2008, Future Gener. Comput. Syst..

[2]  Sara Bouchenak,et al.  Automated control for SLA-aware elastic clouds , 2010, FeBiD '10.

[3]  Huan Liu,et al.  GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[4]  Hyeonsang Eom,et al.  Toward a cost-effective cloud storage service , 2010, 2010 The 12th International Conference on Advanced Communication Technology (ICACT).

[5]  Marc Sánchez Artigas,et al.  Towards the design of optimal data redundancy schemes for heterogeneous cloud storage infrastructures , 2011, Comput. Networks.

[6]  Farokh B. Bastani,et al.  Secure, Dependable, and High Performance Cloud Storage , 2010, 2010 29th IEEE Symposium on Reliable Distributed Systems.

[7]  Kevin D. Seppi,et al.  Parallel PSO using MapReduce , 2007, 2007 IEEE Congress on Evolutionary Computation.

[8]  Gabriel Antoniu Autonomic Cloud Storage: Challenges at Stake , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.