论文信息 - Cache utilization for enhancing analyzation of Big-Data & increasing the performance of Hadoop

Cache utilization for enhancing analyzation of Big-Data & increasing the performance of Hadoop

Our world generating a lot of different kinds of data. These data can be analyzed and processed for valuable information. The traditional systems like data base, which has been used to store and process, are failing to handle these huge data which ranges in tera and peta bytes and also known as Big-data. We have many tools which can be used to analyze Big-Data. The apache's hadoop is one of the most used Big-data analyzing frameworks, Hadoop uses large number of libraries to handle and manage Big-data processes. It also handles different kinds of failures which may occur in the system. It uses map-reduce programing paradigm to analyze, distributed processing and storage of Big-Data. Big-data will be divided in different blocks and distributed within the network. The mapper functions runs in parallel on each block of Big-data and parse it to filter out the required data, which can be used for further processing. The reducer function accepts the data from mapper functions and processes it for required or expected results. It has been observed that, the intermediate data generated by mapper while processing on same Big-Data is always same. Hence, system doing redundant operations and generates same results, which is not an efficient use of resources and it delays the performance speed of the system. The proposed system creates a novel cache, which stores the intermediate data or mapper's output into a novel cache. Whenever the system needs to analyze same Big-data set, It fetches already processed data from novel cache rather than running mapper function on whole Big-data set again.

Sanjeev G Kanbargi | Sunil Kumar S | S. S

[1] Leonid B. Sokolinsky,et al. LFU-K: An Effective Buffer Management Replacement Algorithm , 2004, DASFAA.

[2] James D. Fix,et al. The set-associative cache performance of search trees , 2003, SODA '03.

[4] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[5] Jie Wu,et al. Dache: A data aware caching for big-data applications using the MapReduce framework , 2014 .

[6] Joan Boyar,et al. Access Graphs Results for LRU versus FIFO under Relative Worst Order Analysis , 2012, SWAT.

[7] Lars George,et al. HBase: The Definitive Guide , 2011 .

[8] Nimrod Megiddo,et al. ARC: A Self-Tuning, Low Overhead Replacement Cache , 2003, FAST.

[9] S. Subha. An algorithm for variable cache ways , 2013, 2013 International Conference on Advances in Technology and Engineering (ICATE).

[10] Luca Becchetti,et al. Modeling Locality: A Probabilistic Analysis of LRU and FWF , 2004, ESA.

[11] Hassan Ghasemzadeh,et al. Modified pseudo LRU replacement algorithm , 2006, 13th Annual IEEE International Symposium and Workshop on Engineering of Computer-Based Systems (ECBS'06).

[12] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.