Hadoop Approach to Cluster Based Cache Oblivious Peano Curves

Hadoop is one of the most popular technologies used in the big data landscape for evaluating the data through Hadoop Distributed File System and Map-Reduce. Problems which are larger in size are becoming tough to handle by a single system these days because the execution time for such problems will be very high in such platform. Instead of processing the tasks in a sequential approach, when the processing is done in parallel through the MapReduce method, then results with better efficiency can be expected. In the present method, firstly the Map task decomposes the input into the intermediate keys and then the intermediate keys are sent to the reduce function for processing of data. The algorithm used for performing matrix multiplication is cache oblivious in nature, for better utilization of the memory hierarchy. Processing with the cache oblivious approach increases the re-usability power of the elements and thus decreases the overall execution time. The proposed work for matrix multiplication shall be fault tolerant in nature as there is a replication of data at three places on three different data nodes.

[1]  Parth Gohil,et al.  A novel approach to improve the performance of Hadoop in handling of small files , 2015, 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT).

[2]  Dong Liu,et al.  Parallel Matrix Multiplication Algorithm Based on Vector Linear Combination Using MapReduce , 2013, 2013 IEEE Ninth World Congress on Services.

[3]  Dong Yang,et al.  NativeTask: A Hadoop compatible framework for high performance , 2013, 2013 IEEE International Conference on Big Data.

[4]  Viktor K. Prasanna,et al.  Optimizing graph algorithms for improved cache performance , 2004, Proceedings 16th International Parallel and Distributed Processing Symposium.

[5]  James Cheng,et al.  Distributed Maximal Clique Computation and Management , 2016, IEEE Transactions on Services Computing.

[6]  Jie Wu,et al.  Selection of Virtual Machines Based on Classification of MapReduce Jobs , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems Workshops.

[7]  Vijaya Ramachandran,et al.  Cache-Oblivious Computation: Algorithms and Experimental Evaluation , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[8]  Maolin Tang,et al.  A New Approach to the Cloud-Based Heterogeneous MapReduce Placement Problem , 2016, IEEE Transactions on Services Computing.

[9]  Ling Shang,et al.  Solution of Large Scale Matrix Inversion on Cluster and Grid , 2008, 2008 Seventh International Conference on Grid and Cooperative Computing.

[10]  Ming Dong,et al.  On the clustering of large-scale data: A matrix-based approach , 2011, The 2011 International Joint Conference on Neural Networks.

[11]  Hsiang-Cheh Huang,et al.  A Study on the Cache Miss Rate in a Genetic Algorithm Implementation , 2009, 2009 Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[12]  H. Prokop Cache-Oblivious Algorithms , 1999 .

[13]  Fang-Yie Leu,et al.  Impact of MapReduce Policies on Job Completion Reliability and Job Energy Consumption , 2015, IEEE Transactions on Parallel and Distributed Systems.

[14]  Sanjay Agrawal,et al.  A Performance Analysis of MapReduce Task with Large Number of Files Dataset in Big Data Using Hadoop , 2014, 2014 Fourth International Conference on Communication Systems and Network Technologies.

[15]  Sanjay Kumar Dubey,et al.  Analytical review on Hadoop Distributed file system , 2014, 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence).