Cache utilization for enhancing analyzation of Big-Data & increasing the performance of Hadoop

Our world generating a lot of different kinds of data. These data can be analyzed and processed for valuable information. The traditional systems like data base, which has been used to store and process, are failing to handle these huge data which ranges in tera and peta bytes and also known as Big-data. We have many tools which can be used to analyze Big-Data. The apache's hadoop is one of the most used Big-data analyzing frameworks, Hadoop uses large number of libraries to handle and manage Big-data processes. It also handles different kinds of failures which may occur in the system. It uses map-reduce programing paradigm to analyze, distributed processing and storage of Big-Data. Big-data will be divided in different blocks and distributed within the network. The mapper functions runs in parallel on each block of Big-data and parse it to filter out the required data, which can be used for further processing. The reducer function accepts the data from mapper functions and processes it for required or expected results. It has been observed that, the intermediate data generated by mapper while processing on same Big-Data is always same. Hence, system doing redundant operations and generates same results, which is not an efficient use of resources and it delays the performance speed of the system. The proposed system creates a novel cache, which stores the intermediate data or mapper's output into a novel cache. Whenever the system needs to analyze same Big-data set, It fetches already processed data from novel cache rather than running mapper function on whole Big-data set again.