As MapReduce becomes more and more popular in data processing applications, the demand for Hadoop clusters grows. However, Hadoop is incompatible with existing cluster batch job queuing systems and requires a dedicated cluster under its full control. Hadoop also lacks support for user access control, accounting, fine-grain performance monitoring and legacy batch job processing facilities comparable to existing cluster job queuing systems, making dedicated Hadoop clusters less amenable for administrators and normal users alike with hybrid computing needs involving both MapReduce and legacy applications. As a result, getting a properly suited and sized Hadoop cluster has not been easy in organizations with existing clusters. This paper presents Cloud BATCH, a prototype solution to this problem enabling Hadoop to function as a traditional batch job queuing system with enhanced functionality for cluster resource management. With Cloud BATCH, a complete shift to Hadoop for managing an entire cluster to cater for hybrid computing needs becomes feasible.
[1]
Scott Shenker,et al.
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
,
2010,
EuroSys '10.
[2]
Thomas Sandholm,et al.
Dynamic Proportional Share Scheduling in Hadoop
,
2010,
JSSPP.
[3]
Hans De Sterck,et al.
Supporting multi-row distributed transactions with global snapshot isolation using bare-bones HBase
,
2010,
2010 11th IEEE/ACM International Conference on Grid Computing.
[4]
Hans De Sterck,et al.
Case Study of Scientific Data Processing on a Cloud Using Hadoop
,
2009,
HPCS.
[5]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.