Pursuit of Data Aware Caching for Big-Data using MapReduce Framework Proposal

The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data volumes. For case in point, big data is commonly unstructured and require more real-time analysis strategy. This evaluation calls for new system architectures for data acquisition, storage, transmission and large-scale data processing gadgets. Google incorporates MapReduce concepts and Apache’s Hadoop, due to its open-source implementations; it becomes a principal and motives software systems for big-data applications. An inspection of the MapReduce framework is that the framework generates a large amount of intermediate data. In this prodigy, we insinuate, a data-aware caching framework for big-data applications, also the tasks submit their intermediate results to the cache manager. Hence a task queries the cache manager before executing the actual computing work. A narrative cache description scheme and a cache request and reply protocols are designed and adopted. We implemented a data aware cache technique by extending Hadoop. Pragmatic experiment results reveals that data aware cache significantly improves the finishing point time of MapReduce jobs.