Efficient Utilization of Profiles to Reduce Time in Very Large Data Set

Hadoop is a software framework for analysis of large data sets. Hadoop distributed file system and map reduce paradigm provide an efficient way to deal with terabyte of data being produced every second. MapReduce is known as a popular way to hold data in the cloud environment due to its excellent scalability and good fault tolerance. However, creating profiles for the same job again and again makes it less efficient. This paper proposes an INTERFACE that optimizes time taken to match sampled mapreduce jobs (Js) with already created profiles. It acts as mediator between profile store and worker (nodes).

[1]  Herodotos Herodotou,et al.  PStorM: Profile Storage and Matching for Feedback-Based Tuning of MapReduce Jobs , 2014, EDBT.

[2]  N. Revathi,et al.  Performance Tuning and Scheduling of Large Data Set Analysis in Map Reduce Paradigm by Optimal Configuration using Hadoop , 2013 .

[3]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[4]  Christopher Ré,et al.  Automatic Optimization for MapReduce Programs , 2011, Proc. VLDB Endow..

[5]  Albert Y. Zomaya,et al.  On Using Pattern Matching Algorithms in MapReduce Applications , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.

[6]  Shivnath Babu,et al.  Towards automatic optimization of MapReduce programs , 2010, SoCC '10.

[7]  Shivnath Babu,et al.  to Support the Growing Hadoop Ecosystem , 2012 .

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Vaibhav Kohli,et al.  Big Data Processing using Apache Hadoop in Cloud System , 2012 .

[10]  Woo-Cheol Kim,et al.  Measuring the Optimality of Hadoop Optimization , 2013, ArXiv.

[11]  Albert Y. Zomaya,et al.  On Modeling Dependency between MapReduce Configuration Parameters and Total Execution Time , 2012, ArXiv.

[12]  Sean D Dessureault,et al.  Understanding big data , 2016 .

[13]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[14]  Anthony K. H. Tung,et al.  MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters , 2011, IEEE Transactions on Knowledge and Data Engineering.

[15]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.