A Hot-Data Replication Scheme Based on Data Access Patterns for Enhancing Processing Speed of MapReduce

In recently years, with the growth of social media and the development of mobile devices, the data have been significantly increased. Hadoop has been widely utilized as a typical distributed storage and processing framework. The tasks in Mapreduce based on the Hadoop distributed file system are allocated to the map as close as possible by considering the data locality. However, there are data being requested frequently according to the data analysis tasks of Mapreduce. In this paper, we propose a hot-data replication mechanism to improve the processing speed of Mapreduce according to data access patterns. The proposed scheme reduces the task processing time and improves the data locality using the replica optimization algorithm on the high access frequency of hot data. It is shown through performance evaluation that the proposed scheme outperforms the existing scheme in terms of the load of access frequency.