AN EFFICIENT IMPLEMENTATION OF HANDLING HUGE DATA IN DISTRIBUTED COMPUTING ENVIRONMENT

By distributing a computation across a number of machines, it can be completed in a fraction of time required to run the same computation on a single machine. However, distributing a program over a number of heterogeneous machines proves to be a tedious and difficult job. The objective of our work is to build a distributed system and implement an idea to process huge amount of data, which is time effective. A single user submits a single task to the server that divides the task to different clients based on available memory of the clients to process the data. Our proposed algorithm will decide how much data to send on each client depending on the number of clients and their available memory. Then the server collects the results of different clients processed individual output and merging these outputs server will generate the original output. We have noted the time complexity of the system varying the number of clients and presented a performance graph at the end of this paper. The statistics of time complexity vividly shows that if number of client increases the whole system takes less time than a single computer to process huge data.