Eliminating Homogeneous Cluster Setup for Efficient Parallel Data Processing

This project proposes to eliminate homogeneous cluster setup in a parallel data processing environment. A homogeneous cluster setup supports static nature of processing which is a huge disadvantage for optimising the response time towards clients. Parallel data processing is performed more often in today’s internet and it is very important for the server to deliver the services to its client in optimal time. In order to avail utmost client satisfaction, the server needs to eliminate homogeneous cluster setup that is encountered usually in parallel data processing. The homogeneous cluster setup is static in nature and dynamic allocation of resources is not possible in this kind of environment. The project will also make sure that the user gets its entire requirement fulfilled in optimal time. This will improve the overall resource utilization and, consequently, reduce the processing cost.

[1]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[2]  Shivnath Babu,et al.  Towards automatic optimization of MapReduce programs , 2010, SoCC '10.

[3]  Prashant J. Shenoy,et al.  A platform for scalable one-pass analytics using MapReduce , 2011, SIGMOD '11.

[4]  Hidehiko Tanaka,et al.  An Overview of The System Software of A Parallel Relational Database Machine GRACE , 1986, VLDB.

[5]  Mikal Ziane,et al.  Parallel query processing with zigzag trees , 2005, The VLDB Journal.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Catherine Rosenberg,et al.  Homogeneous vs heterogeneous clustered sensor networks: a comparative study , 2004, 2004 IEEE International Conference on Communications (IEEE Cat. No.04CH37577).

[8]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[9]  Anthony K. H. Tung,et al.  MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters , 2011, IEEE Transactions on Knowledge and Data Engineering.

[10]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[11]  Beng Chin Ooi,et al.  Query optimization for massively parallel data processing , 2011, SoCC.

[12]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[13]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[14]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[15]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[16]  Andrea C. Arpaci-Dusseau,et al.  High-performance sorting on networks of workstations , 1997, SIGMOD '97.