New improvement of the Hadoop relevant data locality scheduling algorithm based on LATE

In the present, scheduling problem is a hot Cloud Computation research issues, the purpose is to coordinate the Cloud Computation resources to be fully rational use. Data locality is one of the main properties in the particular cloud platform for Hadoop. The paper discussed the property, proposed a new improvement of the Hadoop relevant data locality scheduling algorithm based on LATE. The algorithm mainly soved the bakeup of slow task performance problem which bring during the implementation of data read take most of the time and envently influence its processing speed. Finally, carried on experiment to the algorithm and analyzed the funcation, verified the algorithm to improve the response time and the whole system throughput.

[1]  Deng Qian-ni Self-Adaptive Map-Reduce Scheduling Under Heterogeneous Environment , 2009 .

[2]  Lori M. Kaufman,et al.  Data Security in the World of Cloud Computing , 2009, IEEE Security & Privacy.

[3]  A GibsonGarth,et al.  Active Disks for Large-Scale Data Processing , 2001 .

[4]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[5]  Mona Attariyan,et al.  AutoBash: improving configuration management with operating system causality analysis , 2007, SOSP.

[6]  Matei Zaharia,et al.  Job Scheduling for Multi-User MapReduce Clusters , 2009 .

[7]  Craig A. Knoblock,et al.  Speculative plan execution for information agents , 2003 .

[8]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[9]  Luis Rodero-Merino,et al.  A break in the clouds: towards a cloud definition , 2008, CCRV.

[10]  Jason Flinn,et al.  Speculative execution in a distributed file system , 2005, SOSP '05.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Mor Harchol-Balter,et al.  Task assignment in a distributed system (extended abstract): improving performance by unbalancing load , 1997, SIGMETRICS '98/PERFORMANCE '98.

[13]  Mahadev Satyanarayanan,et al.  Diamond: A Storage Architecture for Early Discard in Interactive Search , 2004, FAST.

[14]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.