论文信息 - Pursuit of Data Aware Caching for Big-Data using MapReduce Framework Proposal

Pursuit of Data Aware Caching for Big-Data using MapReduce Framework Proposal

The buzz-word big-data refers to the large-scale distributed data processing applications that operate on exceptionally large amounts of data volumes. For case in point, big data is commonly unstructured and require more real-time analysis strategy. This evaluation calls for new system architectures for data acquisition, storage, transmission and large-scale data processing gadgets. Google incorporates MapReduce concepts and Apache’s Hadoop, due to its open-source implementations; it becomes a principal and motives software systems for big-data applications. An inspection of the MapReduce framework is that the framework generates a large amount of intermediate data. In this prodigy, we insinuate, a data-aware caching framework for big-data applications, also the tasks submit their intermediate results to the cache manager. Hence a task queries the cache manager before executing the actual computing work. A narrative cache description scheme and a cache request and reply protocols are designed and adopted. We implemented a data aware cache technique by extending Hadoop. Pragmatic experiment results reveals that data aware cache significantly improves the finishing point time of MapReduce jobs.

P. G. Scholar | B. G. Nagar

[1] Chao Tian,et al. Nova: continuous Pig/Hadoop workflows , 2011, SIGMOD '11.

[2] Anne-Marie Kermarrec,et al. The many faces of publish/subscribe , 2003, CSUR.

[3] Howard Gobioff,et al. The Google file system , 2003, SOSP '03.

[4] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[5] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6] Ravi Kumar,et al. Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[7] Parag Agrawal,et al. The case for RAMCloud , 2011, Commun. ACM.

[8] Thomas W. Reps,et al. A categorized bibliography on incremental computation , 1993, POPL '93.

[9] Frank Dabek,et al. Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[10] Michael Isard,et al. DryadInc: Reusing Work in Large-scale Computations , 2009, HotCloud.

[11] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.