论文信息 - Efficiently Scheduling Hadoop Cluster in Cloud Environment

Efficiently Scheduling Hadoop Cluster in Cloud Environment

Today, most of the real-time applications like bioinformatics and image processing involve processing of large amounts of unstructured data that requires fast, memory-consuming, and highly efficient resources. This problem has been resolved by the introduction of cloud, which is now the most favored option for big-data analytics. Hadoop, a framework for manipulating unstructured data, is used for this purpose. The nodes that form the Hadoop cluster are scheduled randomly in Amazon cloud. Since huge amounts of data need to be transferred among these nodes, the time taken to upload and process the data is quite high, thereby decreasing the performance. The further focus of service providers is on maximizing resource utilization and minimizing power consumption. This chapter aims at designing an energy-efficient scheduler for a cloud environment that will be suitable for the big-data applications. The working of the scheduler has been tested in OpenStack cloud environment.

[1] Albert Y. Zomaya,et al. A Taxonomy and Survey of Energy-Efficient Data Centers and Cloud Computing Systems , 2010, Adv. Comput..

[2] Ching-Hsien Hsu,et al. Optimizing Energy Consumption with Task Consolidation in Clouds , 2014, Inf. Sci..

[3] Tom White,et al. Hadoop: The Definitive Guide , 2009 .

[4] Zibin Zheng,et al. Toward Optimal Deployment of Communication-Intensive Cloud Applications , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[5] Kwang Mong Sim,et al. Location-Aware Dynamic Resource Allocation Model for Cloud Computing Environment , 2012 .

[6] Brian Hayes,et al. What Is Cloud Computing? , 2019, Cloud Technologies.

[7] Rajkumar Buyya,et al. Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..