Survey of Caching Mechanism in Hadoop

Hadoop is open source project by apache which can be used as stand alone, single node and multi node cluster. Hadoop is develop in version 1x to 2x with map-reduce v2 and YARN by the Apache devlopers.Hadoop is mainly consist of HDFS and Map-Reduce, where HDFS is as storage for Hadoop and Map-Reduce is programming model for data file processing. Hadoop has a centralize cache management which is useful for repetitively accessed files. The distributed cache copies files to every node, then map or reduce reads the files from the local file system and make the data any time accessible any time for use as it is distributed to multiple location in the DataNode blocks of fixed sized.

[1]  Sanjeev G Kanbargi,et al.  Cache utilization for enhancing analyzation of Big-Data & increasing the performance of Hadoop , 2015, 2015 International Conference on Trends in Automation, Communications and Computing Technology (I-TACT-15).

[2]  Jie Wu,et al.  Dache: A data aware caching for big-data applications using the MapReduce framework , 2013, 2013 Proceedings IEEE INFOCOM.

[3]  Debajyoti Mukhopadhyay,et al.  Addressing Name Node Scalability Issue in Hadoop Distributed File System Using Cache Approach , 2014, 2014 International Conference on Information Technology.

[4]  Beomseok Nam,et al.  In-Memory Caching Orchestration for Hadoop , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[5]  Xindong Wu,et al.  A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[6]  Xiaowen Zhang,et al.  Hadoop eco system for big data security and privacy , 2015, 2015 Long Island Systems, Applications and Technology.